Skip to content

airbyte-ci: Add pypi publishing logic #34111

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 60 commits into from
Jan 24, 2024
Merged
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
1f2e7d5
WIP
Jan 10, 2024
aee5ada
Merge remote-tracking branch 'origin/master' into flash1293/airbyte-c…
Jan 11, 2024
dcaef6d
wip
Jan 11, 2024
3cd4f03
make work for setup.py and pyproject.toml
Jan 12, 2024
475b1c5
make publish work
Jan 12, 2024
0360603
fix
Jan 12, 2024
5f6be6c
pass pypi token through
Jan 12, 2024
cf698f2
prepare test
Jan 15, 2024
44f8454
prepare test
Jan 15, 2024
6874dc8
fix bug
Jan 15, 2024
7c3bb25
fix bug
Jan 15, 2024
43cf70a
pass in secret properly
Jan 15, 2024
bd0d636
add debug
Jan 15, 2024
7cc0274
remove debug
Jan 15, 2024
e252b4f
prepare google drive
Jan 15, 2024
439f392
add debug
Jan 15, 2024
ab4bd79
fix bug
Jan 15, 2024
6de583a
fix bug
Jan 15, 2024
08103fc
fix bug
Jan 15, 2024
415dc23
use most up to date metadata
Jan 15, 2024
9e71444
prepare pokeapi
Jan 15, 2024
96a0715
prepare pokeapi
Jan 15, 2024
d6b70d0
debug
Jan 15, 2024
dd7443a
fix
Jan 15, 2024
6063ec7
format
Jan 15, 2024
96e83e5
Merge remote-tracking branch 'origin/master' into flash1293/airbyte-c…
Jan 17, 2024
eee6eca
Merge remote-tracking branch 'origin/master' into flash1293/airbyte-c…
Jan 18, 2024
2069bbd
review comments
Jan 18, 2024
3c408df
adjust for test pypi publishing
Jan 18, 2024
e06d37d
fix things
Jan 18, 2024
df9502c
fix
Jan 18, 2024
0b6b9de
fix
Jan 18, 2024
a0313ac
revert testing stuff
Jan 18, 2024
638b30c
revert
Jan 18, 2024
ee02e87
fix
Jan 18, 2024
2dd2124
fix
Jan 18, 2024
13fa2a1
fix
Jan 18, 2024
e8877d7
Merge remote-tracking branch 'origin/master' into flash1293/airbyte-c…
Jan 18, 2024
da754f2
Merge remote-tracking branch 'origin/master' into flash1293/airbyte-c…
Jan 19, 2024
f80878f
review comments
Jan 20, 2024
707f769
review comments
Jan 20, 2024
a0fbf5f
enable pypi
Jan 20, 2024
e082d65
enable pypi second try
Jan 20, 2024
26f680a
enable pypi debug
Jan 20, 2024
0b78a12
fix test
Jan 20, 2024
391b68c
fix test
Jan 20, 2024
bd7c918
fix test
Jan 20, 2024
1a6e0c3
revert pokeapi changes
Jan 20, 2024
671c317
Merge branch 'master' into flash1293/airbyte-ci-python-publish
Jan 23, 2024
bbeeea2
Merge remote-tracking branch 'origin/master' into flash1293/airbyte-c…
Jan 24, 2024
d5c037f
review comments
Jan 24, 2024
a44fdc4
review comments
Jan 24, 2024
faaf6db
test with pokeapi
Jan 24, 2024
6923181
switch order
Jan 24, 2024
c30cb89
format
Jan 24, 2024
84da11f
fix github actions
Jan 24, 2024
004af84
dummy-change
Jan 24, 2024
e8b86ef
dummy-change 2
Jan 24, 2024
4691763
fix variable in github action flow
Jan 24, 2024
f99aaef
revert test changes
Jan 24, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/publish_connectors.yml
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,8 @@ jobs:
tailscale_auth_key: ${{ secrets.TAILSCALE_AUTH_KEY }}
subcommand: "connectors ${{ github.event.inputs.connectors-options }} publish ${{ github.event.inputs.publish-options }}"
airbyte_ci_binary_url: ${{ github.event.inputs.airbyte-ci-binary-url }}
env:
PYPI_TOKEN: ${{ secrets.PYPI_TOKEN }}

set-instatus-incident-on-failure:
name: Create Instatus Incident on Failure
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@
from pipelines.airbyte_ci.connectors.publish.context import PublishConnectorContext
from pipelines.airbyte_ci.connectors.reports import ConnectorReport
from pipelines.airbyte_ci.metadata.pipeline import MetadataUpload, MetadataValidation
from pipelines.airbyte_ci.poetry.pipeline import PublishToPyPI, PyPIPublishContext
from pipelines.airbyte_ci.poetry.utils import is_package_published
from pipelines.dagger.actions.remote_storage import upload_to_gcs
from pipelines.dagger.actions.system import docker
from pipelines.models.steps import Step, StepResult, StepStatus
Expand Down Expand Up @@ -52,6 +54,22 @@ async def _run(self) -> StepResult:
return StepResult(self, status=StepStatus.SUCCESS, stdout=f"No manifest found for {self.context.docker_image}.")


class CheckPypiPackageExists(Step):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will you reuse this step on the airbyte-lib release? If yes I suggest moving it outside the connectors package.

Copy link
Contributor Author

@flash1293 flash1293 Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't plan to - when running it via the command IMHO the caller is responsible (in case of airbyte-lib the Github action will bump the version) for this and giving them the exact error from pypi is the best behavior.

context: PyPIPublishContext
title = "Check if the connector is published on pypi"

async def _run(self) -> StepResult:
is_published = is_package_published(self.context.package_name, self.context.version, self.context.test_pypi)
if is_published:
return StepResult(
self, status=StepStatus.SKIPPED, stderr=f"{self.context.package_name} already exists in version {self.context.version}."
)
else:
return StepResult(
self, status=StepStatus.SUCCESS, stdout=f"{self.context.package_name} does not exist in version {self.context.version}."
)


class PushConnectorImageToRegistry(Step):
context: PublishConnectorContext
title = "Push connector image to registry"
Expand Down Expand Up @@ -273,6 +291,22 @@ def create_connector_report(results: List[StepResult]) -> ConnectorReport:
metadata_upload_results = await metadata_upload_step.run()
results.append(metadata_upload_results)

# Try to convert the context to a PyPIPublishContext. If it returns None, it means we don't need to publish to pypi.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we mark the related steps as skipped and log a message when conversion can't happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the behavior is like this:

  • If pypi is not enabled, the steps are not even shown
  • If pypi is enabled but it's published already, the steps are shown as skipped

My reasoning was that it might be weird / unexpected to even show pypi-related stuff on connector reports that have nothing to do with it. Basically assuming the following semantic meaning of skip: "Normally this step would be performed, but an edge case prevent us from doing so - you might look into that".

No strong opinion on this though, we can also always show them as "skipped" if you think it makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, it makes no sense to show skipped pypi steps for a java connector.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the unit tests in this module to check the behavior following your addition?

pypi_context = await PyPIPublishContext.from_connector_context(context)
if pypi_context:
check_pypi_package_exists_results = await CheckPypiPackageExists(pypi_context).run()
results.append(check_pypi_package_exists_results)
if check_pypi_package_exists_results.status is StepStatus.SKIPPED:
context.logger.info("The connector version is already published on pypi.")
elif check_pypi_package_exists_results.status is StepStatus.SUCCESS:
context.logger.info("The connector version is not published on pypi. Let's build and publish it.")
publish_to_pypi_results = await PublishToPyPI(pypi_context).run()
results.append(publish_to_pypi_results)
if publish_to_pypi_results.status is StepStatus.FAILURE:
return create_connector_report(results)
elif check_pypi_package_exists_results.status is StepStatus.FAILURE:
return create_connector_report(results)

# Exit early if the connector image already exists or has failed to build
if check_connector_image_results.status is not StepStatus.SUCCESS:
return create_connector_report(results)
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
#
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
#
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
#

"""
Module exposing the format commands.
"""
from __future__ import annotations

import logging
import sys
from typing import Any, Dict, List, Optional

import asyncclick as click
from pipelines.airbyte_ci.format.configuration import FORMATTERS_CONFIGURATIONS, Formatter
from pipelines.airbyte_ci.format.format_command import FormatCommand
from pipelines.cli.click_decorators import click_ignore_unused_kwargs, click_merge_args_into_context_obj
from pipelines.cli.dagger_pipeline_command import DaggerPipelineCommand
from pipelines.helpers.cli import LogOptions, invoke_commands_concurrently, invoke_commands_sequentially, log_command_results
from pipelines.models.contexts.click_pipeline_context import ClickPipelineContext, pass_pipeline_context
from pipelines.models.steps import StepStatus

from .pipeline import PublishToPyPI, PyPIPublishContext


@click.group(
name="poetry",
help="Commands related to running poetry commands.",
)
@click.option(
"--package-path",
help="The path to publish",
type=click.STRING,
required=True,
)
@click.option("--docker-image", help="The docker image to run the command in.", type=click.STRING, default="mwalbeck/python-poetry")
@click_merge_args_into_context_obj
@pass_pipeline_context
@click_ignore_unused_kwargs
async def poetry(pipeline_context: ClickPipelineContext) -> None:
pass


@poetry.command(cls=DaggerPipelineCommand, name="publish", help="Publish a Python package to PyPI.")
@click.option(
"--pypi-token",
help="Access token",
type=click.STRING,
required=True,
envvar="PYPI_TOKEN",
)
@click.option(
"--test-pypi",
help="Whether to publish to test.pypi.org instead of pypi.org.",
type=click.BOOL,
is_flag=True,
default=False,
)
@click.option(
"--publish-name",
help="The name of the package to publish. If not set, the name will be inferred from the pyproject.toml file of the package.",
type=click.STRING,
)
@click.option(
"--publish-version",
help="The version of the package to publish. If not set, the version will be inferred from the pyproject.toml file of the package.",
type=click.STRING,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the use cases for these options? Do we ever want to override a package name / version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a kind of a "plumbing" option which allows to publish a connector from a local machine without going through the regular pipeline.

@pass_pipeline_context
@click.pass_context
async def publish(
ctx: click.Context,
click_pipeline_context: ClickPipelineContext,
pypi_token: str,
test_pypi: bool,
publish_name: Optional[str],
publish_version: Optional[str],
) -> None:
context = PyPIPublishContext(
is_local=ctx.obj["is_local"],
git_branch=ctx.obj["git_branch"],
git_revision=ctx.obj["git_revision"],
ci_report_bucket=ctx.obj["ci_report_bucket_name"],
report_output_prefix=ctx.obj["report_output_prefix"],
gha_workflow_run_url=ctx.obj.get("gha_workflow_run_url"),
dagger_logs_url=ctx.obj.get("dagger_logs_url"),
pipeline_start_timestamp=ctx.obj.get("pipeline_start_timestamp"),
ci_context=ctx.obj.get("ci_context"),
ci_gcs_credentials=ctx.obj["ci_gcs_credentials"],
pypi_token=pypi_token,
test_pypi=test_pypi,
package_path=ctx.obj["package_path"],
build_docker_image=ctx.obj["docker_image"],
package_name=publish_name,
version=publish_version,
)
dagger_client = await click_pipeline_context.get_dagger_client(pipeline_name=f"Publish {ctx.obj['package_path']} to PyPI")
context.dagger_client = dagger_client

await PublishToPyPI(context).run()

return True
Original file line number Diff line number Diff line change
@@ -0,0 +1,209 @@
#
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
#

import configparser
import io
import os
from textwrap import dedent
from typing import Optional

import tomli
import tomli_w
import yaml
from pipelines.airbyte_ci.connectors.context import ConnectorContext, PipelineContext
from pipelines.models.contexts.pipeline_context import PipelineContext
from pipelines.models.steps import Step, StepResult


class PyPIPublishContext(PipelineContext):
def __init__(
self,
pypi_token: str,
test_pypi: bool,
package_path: str,
build_docker_image: str,
ci_report_bucket: str,
report_output_prefix: str,
is_local: bool,
git_branch: bool,
git_revision: bool,
gha_workflow_run_url: Optional[str] = None,
dagger_logs_url: Optional[str] = None,
pipeline_start_timestamp: Optional[int] = None,
ci_context: Optional[str] = None,
ci_gcs_credentials: str = None,
package_name: Optional[str] = None,
version: Optional[str] = None,
):
self.pypi_token = pypi_token
self.test_pypi = test_pypi
self.package_path = package_path
self.package_name = package_name
self.version = version
self.build_docker_image = build_docker_image

pipeline_name = f"Publish PyPI {package_path}"

super().__init__(
pipeline_name=pipeline_name,
report_output_prefix=report_output_prefix,
ci_report_bucket=ci_report_bucket,
is_local=is_local,
git_branch=git_branch,
git_revision=git_revision,
gha_workflow_run_url=gha_workflow_run_url,
dagger_logs_url=dagger_logs_url,
pipeline_start_timestamp=pipeline_start_timestamp,
ci_context=ci_context,
ci_gcs_credentials=ci_gcs_credentials,
)

@staticmethod
async def from_connector_context(connector_context: ConnectorContext) -> Optional["PyPIPublishContext"]:
"""
Create a PyPIPublishContext from a ConnectorContext.

The metadata of the connector is read from the current workdir to capture changes that are not yet published.
If pypi is not enabled, this will return None.
"""

current_metadata = yaml.safe_load(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use connector_context.connector.metadata

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work because connector_context.connector.metadata is what got published the last time, not the current state in the repo, so it wouldn't publish on the first merge that sets remoteRegistries.pypi.enabled to true.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I conceptually prefer your approach because it loads the file from a dagger Directory.
But I think you're wrong about connector_context.connector.metadata is what got published the last time
The metadata property reads the metadata file current state from the filesystem..
My point is that I'd prefer this method to use the common interface we've declared to access metadata.
But eventually the metadata access should be refactored to use the get_repo_file method internally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I could have sworn I ran into problems with this, that's why I changed it, but maybe I confused myself. I will look into this.

await connector_context.get_repo_file(str(connector_context.connector.metadata_file_path)).contents()
)["data"]
print(current_metadata)
if (
not "remoteRegistries" in current_metadata
or not "pypi" in current_metadata["remoteRegistries"]
or not current_metadata["remoteRegistries"]["pypi"]["enabled"]
):
return None

if "connectorBuildOptions" in current_metadata and "baseImage" in current_metadata["connectorBuildOptions"]:
build_docker_image = current_metadata["connectorBuildOptions"]["baseImage"]
else:
build_docker_image = "mwalbeck/python-poetry"
pypi_context = PyPIPublishContext(
pypi_token=os.environ["PYPI_TOKEN"],
test_pypi=True, # TODO: Go live
package_path=str(connector_context.connector.code_directory),
package_name=current_metadata["remoteRegistries"]["pypi"]["packageName"],
version=current_metadata["dockerImageTag"],
build_docker_image=build_docker_image,
ci_report_bucket=connector_context.ci_report_bucket,
report_output_prefix=connector_context.report_output_prefix,
is_local=connector_context.is_local,
git_branch=connector_context.git_branch,
git_revision=connector_context.git_revision,
gha_workflow_run_url=connector_context.gha_workflow_run_url,
dagger_logs_url=connector_context.dagger_logs_url,
pipeline_start_timestamp=connector_context.pipeline_start_timestamp,
ci_context=connector_context.ci_context,
ci_gcs_credentials=connector_context.ci_gcs_credentials,
)
pypi_context.dagger_client = connector_context.dagger_client
return pypi_context


class PublishToPyPI(Step):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd appreciate a kind of integration test that would publish to test pypi.

context: PyPIPublishContext
title = "Publish package to PyPI"

async def _run(self) -> StepResult:
dir_to_publish = await self.context.get_repo_dir(self.context.package_path)

files = await dir_to_publish.entries()
is_poetry_package = "pyproject.toml" in files
is_pip_package = "setup.py" in files

# Try to infer package name and version from the pyproject.toml file. If it is not present, we need to have the package name and version set
# Setup.py packages need to set package name and version as parameter
if not self.context.package_name or not self.context.version:
if not is_poetry_package:
return self.skip(
"Connector does not have a pyproject.toml file and version and package name is not set otherwise, skipping."
)

pyproject_toml = dir_to_publish.file("pyproject.toml")
pyproject_toml_content = await pyproject_toml.contents()
contents = tomli.loads(pyproject_toml_content)
if (
"tool" not in contents
or "poetry" not in contents["tool"]
or "name" not in contents["tool"]["poetry"]
or "version" not in contents["tool"]["poetry"]
):
return self.skip(
"Connector does not have a pyproject.toml file which specifies package name and version and they are not set otherwise, skipping."
)

self.context.package_name = contents["tool"]["poetry"]["name"]
self.context.version = contents["tool"]["poetry"]["version"]

print(
f"Uploading package {self.context.package_name} version {self.context.version} to {'testpypi' if self.context.test_pypi else 'pypi'}..."
)

if is_pip_package:
# legacy publish logic
pypi_username = self.context.dagger_client.set_secret("pypi_username", "__token__")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡Given that we are following a simple convention of

  1. Choose docker image
  2. Set secrets
  3. Mount paths
  4. Run command

You may want to look at SimpleDockerStep. Heres an example of it in use

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to convert it this way but due to the branching and some extras like writing certain files it got messy and IMHO it's clearer when cleaned up like @alafanechere described above - I'm still working through some comments, let me finish that and have a look at the final results, happy to do another pass.

pypi_password = self.context.dagger_client.set_secret("pypi_password", f"pypi-{self.context.pypi_token}")
metadata = {
"name": self.context.package_name,
"version": self.context.version,
"author": "Airbyte",
"author_email": "[email protected]",
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be pulled from the setup file 🙏

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean author and author_email? If yes I did that on purpose to enforce consistency. Added a comment for this.

if "README.md" in files:
metadata["long_description"] = await dir_to_publish.file("README.md").contents()
metadata["long_description_content_type"] = "text/markdown"

config = configparser.ConfigParser()
config["metadata"] = metadata

setup_cfg_io = io.StringIO()
config.write(setup_cfg_io)
setup_cfg = setup_cfg_io.getvalue()

twine_upload = (
self.context.dagger_client.container()
.from_(self.context.build_docker_image)
.with_exec(["apt-get", "update"])
.with_exec(["apt-get", "install", "-y", "twine"])
.with_directory("package", dir_to_publish)
.with_workdir("package")
# clear out setup.py metadata so setup.cfg is used
.with_exec(["sed", "-i", "/name=/d; /author=/d; /author_email=/d; /version=/d", "setup.py"])
.with_new_file("setup.cfg", contents=setup_cfg)
.with_exec(["pip", "install", "--upgrade", "setuptools", "wheel"])
.with_exec(["python", "setup.py", "sdist", "bdist_wheel"])
.with_secret_variable("TWINE_USERNAME", pypi_username)
.with_secret_variable("TWINE_PASSWORD", pypi_password)
.with_exec(["twine", "upload", "--verbose", "--repository", "testpypi" if self.context.test_pypi else "pypi", "dist/*"])
)

return await self.get_step_result(twine_upload)
else:
# poetry publish logic
pypi_token = self.context.dagger_client.set_secret("pypi_token", f"pypi-{self.context.pypi_token}")
pyproject_toml = dir_to_publish.file("pyproject.toml")
pyproject_toml_content = await pyproject_toml.contents()
contents = tomli.loads(pyproject_toml_content)
# make sure package name and version are set to the configured one
contents["tool"]["poetry"]["name"] = self.context.package_name
contents["tool"]["poetry"]["version"] = self.context.version
poetry_publish = (
self.context.dagger_client.container()
.from_(self.context.build_docker_image)
.with_secret_variable("PYPI_TOKEN", pypi_token)
.with_directory("package", dir_to_publish)
.with_workdir("package")
.with_new_file("pyproject.toml", contents=tomli_w.dumps(contents))
.with_exec(["poetry", "config", "repositories.testpypi", "https://test.pypi.org/legacy/"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we always want to configure the test repo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem to hurt to do so

.with_exec(
["sh", "-c", f"poetry config {'pypi-token.testpypi' if self.context.test_pypi else 'pypi-token.pypi'} $PYPI_TOKEN"]
)
.with_exec(["poetry", "publish", "--build", "--repository", "testpypi" if self.context.test_pypi else "pypi"])
)

return await self.get_step_result(poetry_publish)
Loading