Skip to content

🚀 Add files source to CDK + generator template #14410

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 25 commits into from

Conversation

Phlair
Copy link
Contributor

@Phlair Phlair commented Jul 5, 2022

What

Solving #5111 by adding a files source to the cdk, alongside http.
Also added a generator template.

Currently leaving this as draft until I prove the concept by rebuilding the S3 connector using this (publishing cdk here to test). Possibly also build another source (google drive maybe?)

Keen for thoughts on:

  • I've made this additive by separating the dependencies in setup.py, e.g. "airbyte-cdk[files-source]~=0.2.0",
  • Bumped minor version as this seems like a significant addition but I could foresee that potentially being problematic as most connectors specify ~=0.1.x

How

Added generator template
This is all new so would be most useful to have some eyes on reviewing this
I still need to add the relevant scaffold template too.

Added an abstract_files_source under sources in cdk.
Added a folder for files next to http in cdk.
Mostly these are a lift and shift from the abstract files source in source-s3 but made some small improvements/changes where it made sense.

Added suitable unit tests for abstract files classes and formats.
These are also mostly lift and shift although probably more changes here as they were a bit jumbled between abstract and S3-specific in source-s3

For now, I've left source-s3 as is, but the idea would be to update that to use these ABCs from the cdk added here.

@Phlair Phlair requested a review from a team July 5, 2022 12:57
@github-actions github-actions bot added the CDK Connector Development Kit label Jul 5, 2022
@Phlair Phlair requested a review from sherifnada July 5, 2022 12:57
@Phlair Phlair marked this pull request as draft July 5, 2022 12:57
@Phlair Phlair requested a review from girarda July 5, 2022 13:00
@Phlair Phlair self-assigned this Jul 5, 2022
@sherifnada sherifnada linked an issue Jul 7, 2022 that may be closed by this pull request
Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

template LGTM, I wasn't able to tell which things changed in files source, if you need a code review, could you point those out?

from airbyte_cdk.sources.streams import Stream
from wcmatch.glob import GLOBSTAR, SPLIT, globmatch

# ideas on extending this to handle multiple streams:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given our inability to evolve schemas we either do this after schema migration is available or we do it now. Feels like a non-trivial lift to do it now? so maybe let's keep it as is? WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, is it that big of a lift to take that whole part of the config and put it inside a list ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess wrt S3 it leaves us with the folllowing options:

  • don't port S3 to CDK-based abstract files source
  • port s3 and introduce a shim spec.json and convert between that and this new AbstractfileSource's spec
  • don't do it for now

setup(
name="airbyte-cdk",
version="0.1.62",
version="0.2.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably just move to proper semver i.e: 1.X.X at this point. Nothing stopping us from doing it. Either we do it now (and update connectors to match) or we just keep patching

# description="In order to access private storage, this connector requires credentials with the correct permissions."
# airbyte_secret=True,
# order=1,
#)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#)
#)
# See https://docs.airbyte.io/connector-development/connector-specification-reference for specification reference docs

@@ -0,0 +1,9 @@
id,name,valid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this folder/file?

@sherifnada
Copy link
Contributor

Pausing work on this, will reopen if resuming

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make the abstract files source framework available in the cdk
3 participants