Update error messaging/type for missing streams. Note: version mismatch, please use 0.78.9 instead
low-code: add backward compatibility for old close slice behavior
low-code: fix stop_condition instantiation in the cursor pagination
low-code: Add last_record and last_page_size interpolation variables to pagination
Fix dependencies for file-based extras
low-code: fix retrieving partition key for legacy state migration
connector-builder: return full url-encoded URL instead of separating parameters
low-code: Allow state migration with CustomPartitionRouter
Emit state recordCount as float instead of integer
Fix empty , , extras packages
low-code: Add string interpolation filter
Migrate Python CDK to Poetry
low-code: Add StateMigration component
Request option params are allowed to be an array
set minimum python version to 3.9
Connector Builder: have schema fields be nullable by default except from PK and cursor field
low code: add refresh_token_error handler to DeclarativeOauth2Authenticator
low-code: Allow defining custom schema loaders
Declarative datetime-based cursors now only derive state values from records that were read
low-code: remove superfluous sleep
File-based CDK: Fix tab delimiter configuration in CSV file type
testing
low-code: improve error message when a custom component cannot be found
Update mock server test entrypoint wrapper to use per-stream state
Include recordCount in stream state messages and final state message for full refresh syncs
low-code: update cartesian stream slice to emit typed StreamSlice
Low-code: adding a default value if a stream slice is None during read_records
low-code: remove parent cursor compoent from incremental substreams' state message
no-op republish of 0.68.0
low-code: Allow page size to be defined with string interpolation
CDK: upgrade pyarrow
File CDK: Update parquet parser to handle values that resolve to None
Fix handling of tab-separated CSVs
Low-code: Add CustomRecordFilter
Low-code: Add interpolation for request options
low-code: Allow connectors to ignore stream slicer request options on paginated requests
Low-code: Add filter to RemoveFields
Correct handling of custom max_records limits in connector_builder
File-based CDK: fix record enqueuing
Per-stream error reporting and continue syncing on error by default
mask access key when logging refresh response
[ISSUE #34910] add headers to HttpResponse for test framework
File-based CDK: functionality to make incremental syncs concurrent
[ISSUE #34755] do not propagate parameters on JSON schemas
Align version in CDK Dockerfile to be consistent. Before this change, the docker images was mistakenly pinned to version 0.58.5.
File-based CDK: log warning on no sync mode instead of raising exception
Improve error messages for concurrent CDK
Emit state when no partitions are generated for ccdk and update StateBuilder
File-based CDK: run full refresh syncs with concurrency
Fix CCDK overlapping message due to print in entrypoint
Fix concurrent CDK deadlock
Fix state message handling when running concurrent syncs
concurrent-cdk: improve resource usage when reading from substreams
CDK: HttpRequester can accept http_method in str format, which is required by custom low code components
File CDK: Added logic to emit logged RecordParseError
errors and raise the single AirbyteTracebackException
in the end of the sync, instead of silent skipping the parsing errors. PR: #32589
Handle private network exception as config error
Add POST method to HttpMocker
fix declarative oauth initialization
Integration tests: adding debug mode to improve logging
Add schema normalization to declarative stream
Concurrent CDK: add state converter for ISO timestamps with millisecond granularity
add SelectiveAuthenticator
File CDK: Support raw txt file
Adding more tooling to cover source-stripe events stream
Raise error on passing unsupported value formats as query parameters
Vector DB CDK: Refactor embedders, File based CDK: Handle 422 errors properly in document file type parser
Vector DB CDK: Refactor embedders, File based CDK: Handle 422 errors properly in document file type parser
Update airbyte-protocol
Improve integration tests tooling
low-code: cache requests sent for parent streams
File-based CDK: Add support for automatic primary key for document file type format
File-based CDK: Add support for remote parsing of document file type format via API
Vector DB CDK: Fix bug with embedding tokens with special meaning like <|endoftext|>
no-op to verify pypi publish flow
Allow for connectors to continue syncing when a stream fails
File-based CDK: hide source-defined primary key; users can define primary keys in the connection's configuration
Source Integration tests: decoupling entrypoint wrapper from pytest
First iteration of integration tests tooling (http mocker and response builder)
concurrent-cdk: factory method initializes concurrent source with default number of max tasks
Vector DB CDK: Add omit_raw_text flag
concurrent cdk: read multiple streams concurrently
low-code: fix injection of page token if first request
Fix of generate the error message using _try_get_error based on list of errors
Vector DB CDK: Remove CDC records, File CDK: Update unstructured parser
low-code: fix debug logging when using --debug flag
Increase maximum_attempts_to_acquire to avoid crashing in acquire_call
File CDK: Improve stream config appearance
Concurrent CDK: fix futures pruning
Fix spec schema generation for File CDK and Vector DB CDK and allow skipping invalid files in document file parser
Concurrent CDK: Increase connection pool size to allow for 20 max workers
Concurrent CDK: Improve handling of future to avoid memory leak and improve performances
Add call rate functionality
Fix class SessionTokenAuthenticator for CLASS_TYPES_REGISTRY mapper
File CDK: Improve file type detection in document file type parser
Concurrent CDK: incremental (missing state conversion). Outside of concurrent specific work, this includes the following changes:
- Checkpointing state was acting on the number of records per slice. This has been changed to consider the number of records per syncs
Source.read_state
andSource._emit_legacy_state_format
are now classmethods to allow for developers to have access to the state before instantiating the source
File CDK: Add pptx support
make parameter as not required for default backoff handler
use in-memory cache if no file path is provided
File CDK: Add unstructured parser
Update source-declarative-manifest base image to update Linux alpine and Python
Add max time for backoff handler
File CDK: Add CustomFileBasedException for custom errors
low-code: Allow connector developers to specify the type of an added field
concurrent cdk: fail fast if a partition raises an exception
File CDK: Avoid listing all files for check command
Vector DB CDK: Expose stream identifier logic, add field remapping to processing | File CDK: Emit analytics message for used streams
Add filters for base64 encode and decode in Jinja Interpolation
Few bug fixes for concurrent cdk
Add ability to wrap HTTP errors with specific status codes occurred during access token refresh into AirbyteTracedException
Enable debug logging when running availability check
Enable debug logging when running availability check
File CDK: Allow configuring number of tested files for schema inference and parsability check
Vector DB CDK: Fix OpenAI compatible embedder when used without api key
Vector DB CDK: Improve batching process
Introduce experimental ThreadBasedConcurrentStream
Fix initialize of token_expiry_is_time_of_expiration field
Add new token_expiry_is_time_of_expiration property for AbstractOauth2Authenticator for indicate that token's expiry_in is a time of expiration
Coerce read_records to iterable in http availabilty strategy
Add functionality enabling Page Number/Offset to be set on the first request
Fix parsing of UUID fields in avro files
Vector DB CDK: Fix OpenAI embedder batch size
Add configurable OpenAI embedder to cdk and add cloud environment helper
Fix previous version of request_cache clearing
Fix request_cache clearing and move it to tmp folder
Vector DB CDK: Adjust batch size for Azure embedder to current limits
Change Error message if Stream is not found
Vector DB CDK: Add text splitting options to document processing
Ensuring invalid user-provided urls does not generate sentry issues
Vector DB CDK adjustments: Prevent failures with big records and OpenAI embedder
[ISSUE #30353] File-Based CDK: remove file_type from stream config
Connector Builder: fix datetime format inference for str parsable as int but not isdecimal
Vector DB CDK: Add Azure OpenAI embedder
File-based CDK: improve error message for CSV parsing error
File-based CDK: migrated parsing error to config error to avoid sentry alerts
Add from-field embedder to vector db CDK
FIle-based CDK: Update spec and fix autogenerated headers with skip after
Vector DB CDK adjustments: Fix id generation, improve config spec, add base test case
[Issue #29660] Support empty keys with record selection
Add vector db CDK helpers
File-based CDK: allow user to provided column names for CSV files
File-based CDK: allow for extension mismatch
File-based CDK: Remove CSV noisy log
Source-S3 V4: feature parity rollout
File-based CDK: Do not stop processing files in slice on error
Check config against spec in embedded sources and remove list endpoint from connector builder module
low-code: allow formatting datetime as milliseconds since unix epoch
File-based CDK: handle legacy options
Fix title and description of datetime_format fields
File-based CDK cursor and entrypoint updates
Low code CDK: Decouple SimpleRetriever and HttpStream
Add utils for embedding sources in other Python applications
Relax pydantic version requirement and update to protocol models version 0.4.0
Support many format for cursor datetime
File-based CDK updates
Connector Builder: Ensure we return when there are no slices
low-code: deduplicate query params if they are already encoded in the URL
Fix RemoveFields transformation issue
Breaking change: Rename existing SessionTokenAuthenticator to LegacySessionTokenAuthenticator and make SessionTokenAuthenticator more generic
Connector builder: warn if the max number of records was reached
Remove pyarrow from main dependency and add it to extras
Fix pyyaml and cython incompatibility
Connector builder: Show all request/responses as part of the testing panel
[ISSUE #27494] allow for state to rely on transformed field
Ensuring the state value format matches the cursor value from the record
Fix issue with incremental sync following data feed release
Support data feed like incremental syncs
Fix return type of RecordFilter: changed from generator to list
Connector builder module: serialize request body as string
Fix availability check to handle HttpErrors which happen during slice extraction
Refactoring declarative state management
Error message on state per partition state discrepancy
Supporting state per partition given incremental sync and partition router
Use x-www-urlencoded for access token refresh requests
Replace with when making oauth calls
Emit messages using message repository
Add utils for inferring datetime formats
Add a metadata field to the declarative component schema
make DatetimeBasedCursor.end_datetime optional
Remove SingleUseRefreshTokenOAuthAuthenticator from low code CDK and add generic injection capabilities to ApiKeyAuthenticator
Connector builder: add latest connector config control message to read calls
Add refresh token update capabilities to OAuthAuthenticator
Make step and cursor_granularity optional
Improve connector builder error messages
Align schema generation in SchemaInferrer with Airbyte platform capabilities
Allow nested objects in request_body_json
low-code: Make refresh token in oauth authenticator optional
Unfreeze requests version and test new pipeline
low-code: use jinja sandbox and restrict some methods
pin the version of the requests library
Support parsing non UTC dates and Connector Builder set slice descriptor
low-code: fix add field transformation when running from the connector builder
Emit stream status messages
low-code: remove now_local() macro because it's too unpredictable
low-code: alias stream_interval and stream_partition to stream_slice in jinja context
Connector builder scrubs secrets from raw request and response
low-code: Add title, description, and examples for all fields in the manifest schema
low-code: simplify session token authenticator interface
low-code: fix typo in ManifestDeclarativeSource
Emit slice log messages when running the connector builder
set slice and pages limit when reading from the connector builder module
Low-Code CDK: Enable use of SingleUseRefreshTokenAuthenticator
low-code: fix duplicate stream slicer update
Low-Code CDK: make RecordFilter.filter_records as generator
Enable oauth flow for low-code connectors
Remove unexpected error swallowing on abstract source's check method
connector builder: send stacktrace when error on read
Add connector builder module for handling Connector Builder server requests
CDK's read command handler supports Connector Builder list_streams requests
Fix reset pagination issue on test reads
- Low-code CDK: Override refresh_access_token logic DeclarativeOAuthAuthenticator
Releasing using the new release flow. No change to the CDK per se
OAuth: retry refresh access token requests
Low-Code CDK: duration macro added
support python3.8
Publishing Docker image for source-declarative-manifest
Breaking changes: We have promoted the low-code CDK to Beta. This release contains a number of breaking changes intended to improve the overall usability of the language by reorganizing certain concepts, renaming, reducing some field duplication, and removal of fields that are seldom used.
The changes are:
- Deprecated the concept of Stream Slicers in favor of two individual concepts: Incremental Syncs, and Partition Routers:
- Stream will define an
incremental_sync
field which is responsible for defining how the connector should support incremental syncs using a cursor field.DatetimeStreamSlicer
has been renamed toDatetimeBasedCursor
and can be used for this field. Retriever
s will now define apartition_router
field. The remaining slicers are now calledSubstreamPartitionRouter
andListPartitionRouter
, both of which can be used here as they already have been.- The
CartesianProductStreamSlicer
becausepartition_router
can accept a list of values and will generate that same cartesian product by default.
- Stream will define an
$options
have been renamed to$parameters
- Changed the notation for component references to the JSON schema notation (
$ref: "#/definitions/requester"
) DefaultPaginator
no longer has aurl_base
field. Moving forward, paginators will derive theurl_base
from theHttpRequester
. There are some unique cases for connectors that implement a customRetriever
.primary_key
andname
no longer need to be defined onRetriever
s orRequester
s. They will be derived from the stream’s definition- Streams no longer define a
stream_cursor_field
and will derive it from theincremental_sync
component.checkpoint_interval
has also been deprecated - DpathExtractor
field_pointer
has been renamed tofield_path
RequestOption
can no longer be used with withinject_into
set topath
. There is now a dedicatedRequestPath
component moving forward.
Low-Code CDK: fix signature _parse_records_and_emit_request_and_responses
Low-Code: improve day_delta macro and MinMaxDatetime component
Make HttpAvailabilityStrategy default for HttpStreams
Low-Code CDK: make DatetimeStreamSlicer.step as InterpolatedString
Low-Code: SubstreamSlicer.parent_key - dpath support added
Fix issue when trying to log stream slices that are non-JSON-serializable
Use dpath.util.values method to parse response with nested lists
Use dpath.util.values method to parse response with nested lists
Limiting the number of HTTP requests during a test read
Surface the resolved manifest in the CDK
Add AvailabilityStrategy concept and use check_availability within CheckStream
Add missing package in previous patch release
Handle edge cases for CheckStream - checking connection to empty stream, and checking connection to substream with no parent records
Low-Code: Refactor low-code to use Pydantic model based manifest parsing and component creation
Low-code: Make documentation_url in the Spec be optional
Low-Code: Handle forward references in manifest
Allow for CustomRequester to be defined within declarative manifests
Adding cursor_granularity
to the declarative API of DatetimeStreamSlicer
Add utility class to infer schemas from real records
Do not eagerly refresh access token in SingleUseRefreshTokenOauth2Authenticator
#20923
Fix the naming of OAuthAuthenticator
Include declarative_component_schema.yaml in the publish to PyPi
Start validating low-code manifests using the declarative_component_schema.yaml file
Reverts additions from versions 0.13.0 and 0.13.3.
Low-code: Add token_expiry_date_format to OAuth Authenticator. Resolve ref schema
Fixed StopIteration
exception for empty streams while check_availability
runs.
Low-code: Enable low-code CDK users to specify schema inline in the manifest
Low-code: Add SessionTokenAuthenticator
Add Stream.check_availability
and Stream.AvailabilityStrategy
. Make HttpAvailabilityStrategy
the default HttpStream.AvailabilityStrategy
.
Lookback window should applied when a state is supplied as well
Low-code: Finally, make OffsetIncrement.page_size
interpolated string or int
Revert breaking change on read_config
while keeping the improvement on the error message
Improve error readability when reading JSON config files
Low-code: Log response error message on failure
Low-code: Include the HTTP method used by the request in logging output of the airbyte-cdk
Low-code: Fix the component manifest schema to and validate check instead of checker
Declare a new authenticator SingleUseRefreshTokenOauth2Authenticator
that can perform connector configuration mutation and emit AirbyteControlMessage.ConnectorConfig
.
Low-code: Add start_from_page
option to a PageIncrement class
Low-code: Add jinja macro format_datetime
Low-code: Fix reference resolution for connector builder
Low-code: Avoid duplicate HTTP query in simple_retriever
Low-code: Make default_paginator.page_token_option
optional
Low-code: Fix filtering vars in InterpolatedRequestInputProvider.eval_request_inputs
Low-code: Allow grant_type
to be specified for OAuthAuthenticator
Low-code: Don't update cursor for non-record messages and fix default loader for connector builder manifests
Low-code: Allow for request and response to be emitted as log messages
Low-code: Decouple yaml manifest parsing from the declarative source implementation
Low-code: Allow connector specifications to be defined in the manifest
Low-code: Add support for monthly and yearly incremental updates for DatetimeStreamSlicer
Low-code: Get response.json in a safe way
Low-code: Replace EmptySchemaLoader with DefaultSchemaLoader to retain backwards compatibility Low-code: Evaluate backoff strategies at runtime
Low-code: Allow for read even when schemas are not defined for a connector yet
Low-code: Fix off by one error with the stream slicers
Low-code: Fix a few bugs with the stream slicers
Low-code: Add support for custom error messages on error response filters
Publish python typehints via py.typed
file.
- Propagate options to InterpolatedRequestInputProvider
- Report config validation errors as failed connection status during
check
. - Report config validation errors as
config_error
failure type.
- Low-code: Always convert stream slices output to an iterator
- Replace caching method: VCR.py -> requests-cache with SQLite backend
- Protocol change:
supported_sync_modes
is now a required properties on AirbyteStream. #15591
- Low-code: added hash filter to jinja template
- Low-code: Fix check for streams that do not define a stream slicer
- Low-code: $options do not overwrite parameters that are already set
- Low-code: Pass stream_slice to read_records when reading from CheckStream
- Low-code: Fix default stream schema loader
- Low-code: Expose WaitUntilTimeFromHeader strategy and WaitTimeFromHeader as component type
- Revert 0.1.96
- Improve error for returning non-iterable from connectors parse_response
- Low-code: Expose PageIncrement strategy as component type
- Low-code: Stream schema loader has a default value and can be omitted
- Low-code: Standardize slashes in url_base and path
- Low-code: Properly propagate $options to array items
- Low-code: Log request and response when running check operation in debug mode
- Low-code: Rename LimitPaginator to DefaultPaginator and move page_size field to PaginationStrategy
- Fix error when TypeTransformer tries to warn about invalid transformations in arrays
- Fix: properly emit state when a stream has empty slices, provided by an iterator
- Bugfix: Evaluate
response.text
only in debug mode
- During incremental syncs allow for streams to emit state messages in the per-stream format
- TypeTransformer now converts simple types to array of simple types
- TypeTransformer make warning message more informative
- Make TypeTransformer more robust to incorrect incoming records
- Emit legacy format when state is unspecified for read override connectors
- Fix per-stream to send legacy format for connectors that override read
- Freeze dataclasses-jsonschema to 2.15.1
- Fix regression in
_checkpoint_state
arg
- Update Airbyte Protocol model to support protocol_version
- Add NoAuth to declarative registry and auth parse bug fix
- Fix yaml schema parsing when running from docker container
- Fix yaml config parsing when running from docker container
- Add schema validation for declarative YAML connector configs
- Bugfix: Correctly set parent slice stream for sub-resource streams
- Improve
filter_secrets
skip empty secret
- Replace JelloRecordExtractor with DpathRecordExtractor
- Bugfix: Fix bug in DatetimeStreamSlicer's parsing method
- Bugfix: Fix bug in DatetimeStreamSlicer's format method
- Refactor declarative package to dataclasses
- Bugfix: Requester header always converted to string
- Bugfix: Reset paginator state between stream slices
- Bugfix: Record selector handles single records
- Bugfix: DatetimeStreamSlicer cast interpolated result to string before converting to datetime
- Bugfix: Set stream slicer's request options in SimpleRetriever
- AbstractSource emits a state message when reading incremental even if there were no stream slices to process.
- Replace parse-time string interpolation with run-time interpolation in YAML-based sources
- Add support declarative token authenticator.
- Call init_uncaught_exception_handler from AirbyteEntrypoint.init and Destination.run_cmd
- Add the ability to remove & add records in YAML-based sources
- Allow for detailed debug messages to be enabled using the --debug command.
- Add support for configurable oauth request payload and declarative oauth authenticator.
- Define
namespace
property on theStream
class insidecore.py
.
Bugfix: Correctly obfuscate nested secrets and secrets specified inside oneOf blocks inside the connector's spec.
- Remove legacy sentry code
- Add
requests.exceptions.ChunkedEncodingError
to transient errors so it could be retried
- Add
Stream.get_error_display_message()
to retrieve user-friendly messages from exceptions encountered while reading streams. - Add default error error message retrieval logic for
HTTPStream
s following common API patterns.
TypeTransformer.default_convert
catch TypeError
Update protocol models to support per-stream state: #12829.
- Update protocol models to include
AirbyteTraceMessage
- Emit an
AirbyteTraceMessage
on uncaught exceptions - Add
AirbyteTracedException
Add support for reading the spec from a YAML file (spec.yaml
)
- Add ability to import
IncrementalMixin
fromairbyte_cdk.sources.streams
. - Bumped minimum supported Python version to 3.9.
Remove a false positive error logging during the send process.
Fix BaseBackoffException constructor
Improve logging for Error handling during send process.
Add support for streams with explicit state attribute.
Fix type annotations.
Fix typing errors.
Integrate Sentry for performance and errors tracking.
Log http response status code and its content.
Fix logging of unhandled exceptions: print stacktrace.
Add base pydantic model for connector config and schemas.
Fix build error
Filter airbyte_secrets values at logger and other logging refactorings.
Add __init__.py
to mark the directory airbyte_cdk/utils
as a package.
Improve URL-creation in CDK. Changed to using urllib.parse.urljoin()
.
Fix emitted_at
from seconds * 1000
to correct milliseconds.
Fix broken logger in streams: add logger inheritance for streams from airbyte
.
Fix false warnings on record transform.
Fix logging inside source and streams
Resolve $ref fields for discover json schema.
- Added Sphinx docs
airbyte-cdk/python/reference_docs
module. - Added module documents at
airbyte-cdk/python/sphinx-docs.md
. - Added Read the Docs publishing configuration at
.readthedocs.yaml
.
Transforming Python log levels to Airbyte protocol log levels
Updated OAuth2Specification.rootObject type in airbyte_protocol to allow string or int
Fix import logger error
Added check_config_against_spec
parameter to Connector
abstract class
to allow skipping validating the input config against the spec for non-check
calls
Improving unit test for logger
Use python standard logging instead of custom class
Modified OAuth2Specification
model, added new fields: rootObject
and oauthFlowOutputParameters
Added Transform class to use for mutating record value types so they adhere to jsonschema definition.
Added the ability to use caching for efficient synchronization of nested streams.
Allow passing custom headers to request in OAuth2Authenticator.refresh_access_token()
: #6219
Resolve nested schema references and move external references to single schema definitions.
- Allow using
requests.auth.AuthBase
as authenticators instead of custom CDK authenticators. - Implement Oauth2Authenticator, MultipleTokenAuthenticator and TokenAuthenticator authenticators.
- Add support for both legacy and requests native authenticator to HttpStream class.
No longer prints full config files on validation error to prevent exposing secrets to log file: #5879
Fix incremental stream not saved state when internal limit config set.
Fix mismatching between number of records actually read and number of records in logs by 1: #5767
Update generated AirbyteProtocol models to contain Oauth changes.
Add _limit and _page_size as internal config parameters for SAT
If the input config file does not comply with spec schema, raise an exception instead of system.exit
.
Fix defect with user defined backoff time retry attempts, number of retries logic fixed
Add raise_on_http_errors, max_retries, retry_factor properties to be able to ignore http status errors and modify retry time in HTTP stream
Add checking specified config againt spec for read, write, check and discover commands
Add MultipleTokenAuthenticator
class to allow cycling through a list of API tokens when making HTTP requests
Allow to fetch primary key info from singer catalog
Allow to use non-JSON payloads in request body for http source
Add abstraction for creating destinations.
Fix logging of the initial state.
Allow specifying keyword arguments to be sent on a request made by an HTTP stream: #4493
Allow to use Python 3.7.0: #3566
Fix an issue that caused infinite pagination: #3366
Initial Release