-
Notifications
You must be signed in to change notification settings - Fork 4.6k
🚀 Add files source to CDK + generator template #14410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
template LGTM, I wasn't able to tell which things changed in files source, if you need a code review, could you point those out?
from airbyte_cdk.sources.streams import Stream | ||
from wcmatch.glob import GLOBSTAR, SPLIT, globmatch | ||
|
||
# ideas on extending this to handle multiple streams: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given our inability to evolve schemas we either do this after schema migration is available or we do it now. Feels like a non-trivial lift to do it now? so maybe let's keep it as is? WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, is it that big of a lift to take that whole part of the config and put it inside a list
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess wrt S3 it leaves us with the folllowing options:
- don't port S3 to CDK-based abstract files source
- port s3 and introduce a shim spec.json and convert between that and this new
AbstractfileSource
's spec - don't do it for now
setup( | ||
name="airbyte-cdk", | ||
version="0.1.62", | ||
version="0.2.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably just move to proper semver i.e: 1.X.X at this point. Nothing stopping us from doing it. Either we do it now (and update connectors to match) or we just keep patching
# description="In order to access private storage, this connector requires credentials with the correct permissions." | ||
# airbyte_secret=True, | ||
# order=1, | ||
#) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#) | |
#) | |
# See https://docs.airbyte.io/connector-development/connector-specification-reference for specification reference docs |
@@ -0,0 +1,9 @@ | |||
id,name,valid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this folder/file?
# Conflicts: # airbyte-cdk/python/setup.py
Pausing work on this, will reopen if resuming |
What
Solving #5111 by adding a
files
source to the cdk, alongside http.Also added a generator template.
Currently leaving this as draft until I prove the concept by rebuilding the S3 connector using this (publishing cdk here to test). Possibly also build another source (google drive maybe?)
Keen for thoughts on:
setup.py
, e.g. "airbyte-cdk[files-source]~=0.2.0",How
Added generator template
This is all new so would be most useful to have some eyes on reviewing this
I still need to add the relevant scaffold template too.
Added an
abstract_files_source
under sources in cdk.Added a folder for
files
next tohttp
in cdk.Mostly these are a lift and shift from the abstract files source in source-s3 but made some small improvements/changes where it made sense.
Added suitable unit tests for abstract files classes and formats.
These are also mostly lift and shift although probably more changes here as they were a bit jumbled between abstract and S3-specific in source-s3
For now, I've left source-s3 as is, but the idea would be to update that to use these ABCs from the cdk added here.