Skip to content

✨ airbyte-cdk - Adds JwtAuthenticator to low-code #37005

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 47 commits into from
Apr 19, 2024
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
ce29fb5
Initial jwt authenticator build
pnilan Apr 9, 2024
bb0789d
Update to include config eval
pnilan Apr 10, 2024
e12254b
Update typo
pnilan Apr 10, 2024
dca7b06
Updates JWT auth, schema, and factory
pnilan Apr 11, 2024
58df22d
Update property retrieval in component factory
pnilan Apr 11, 2024
c8160c8
add token duration check
pnilan Apr 11, 2024
16f5165
Fix error with exp header setting
pnilan Apr 11, 2024
1b7b006
Update _get_jwt_payload
pnilan Apr 11, 2024
d41e009
Updates jwt.auth to interpolate inputs correctly
pnilan Apr 11, 2024
a1b2965
Clean up redundancy by removing "alg" from jwt headers
pnilan Apr 11, 2024
b25df50
Updates expiration time setting in jwt payload method
pnilan Apr 11, 2024
86809bd
Update jwt schema to remove HS256
pnilan Apr 11, 2024
63db2a6
Merge branch 'master' into pnilan/airbyte-cdk-jwt-auth
pnilan Apr 11, 2024
40717f5
chore: format code
pnilan Apr 11, 2024
0e374b3
Adds base64 secret key encoding, updates jwt.py returns, adds default…
pnilan Apr 15, 2024
70c2deb
Ran `poetry run poe build`
pnilan Apr 15, 2024
7388a46
Imports InterpolateBoolean for base64 encoding
pnilan Apr 15, 2024
b67ec03
Updates jwt.py to convert values to strings for serialization
pnilan Apr 15, 2024
0aa43a6
Updates jwt.py to convert secret key and algorithm to string
pnilan Apr 15, 2024
342cfad
Update how boolean is handled for base64 encoding
pnilan Apr 15, 2024
a927323
Adds base64_encode_secret_key to component factory
pnilan Apr 16, 2024
46ff236
Add dev defined header prefix option
pnilan Apr 16, 2024
111180a
Updates header prefix
pnilan Apr 16, 2024
5bc18dd
Updates get header prefix for brevity
pnilan Apr 16, 2024
e351f3e
Cleanup
pnilan Apr 16, 2024
00db62e
Wraps jwt.encode in try-catch to handle pyjwt errors
pnilan Apr 16, 2024
8db26a5
Adds unit testing for jwt.py and for model_to_component_factory.py
pnilan Apr 16, 2024
0c14620
Updates jwt schema, model, factory, and class to make jwt_headers and…
pnilan Apr 16, 2024
790e7ea
Merge branch 'master' into pnilan/airbyte-cdk-jwt-auth
pnilan Apr 16, 2024
ae0cef5
chore: format code
pnilan Apr 16, 2024
dd6377b
Updates Algorithm to be enumeration based on PyJWT supported algorithms
pnilan Apr 17, 2024
252f863
Updates unit tests for updated algorithm enumeration
pnilan Apr 17, 2024
0284d30
chore: format code
pnilan Apr 17, 2024
be03266
Merge branch 'master' into pnilan/airbyte-cdk-jwt-auth
pnilan Apr 17, 2024
7515a54
Remove unused import
pnilan Apr 17, 2024
2a419e7
Updates jwt.py, model_to_componet_factory, and relevant tests to reso…
pnilan Apr 17, 2024
47c4604
Update jwt.py for `Any` type when throwing exception
pnilan Apr 17, 2024
4b8512d
Adds explanations of JWT authenticator methods
pnilan Apr 17, 2024
a7559cb
chore: format code
pnilan Apr 17, 2024
2772d66
Updated for linting
pnilan Apr 17, 2024
0e4b8e2
reverts changelog, pyproject.toml, and poetry.lock to previous airbyt…
pnilan Apr 17, 2024
aeca944
Updates JwtAlgorithm to include all supported algos
pnilan Apr 18, 2024
9a32e55
Merge branch 'master' into pnilan/airbyte-cdk-jwt-auth
pnilan Apr 18, 2024
c7bd90b
Reverst CHANGELOG, updates `model_to_component_factory` method condit…
pnilan Apr 18, 2024
0723fee
Update low code documentation to include JwtAuthenticator
pnilan Apr 18, 2024
bb839c9
Fix conditional check error for `create_jwt_authenticator`
pnilan Apr 18, 2024
6075a42
Update `create_jwt_authenticator` jwt_headers and jwt_payload setting
pnilan Apr 18, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 9 additions & 6 deletions airbyte-cdk/python/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Changelog

## 0.82.0
low-code: Add JWTAuthenticator

## 0.81.3
Republish print buffer after previous pypi attempt timed out

Expand Down Expand Up @@ -31,7 +34,7 @@ Fix CDK version mismatch introduced in 0.78.8
Update error messaging/type for missing streams. Note: version mismatch, please use 0.78.9 instead

## 0.78.6
low-code: add backward compatibility for old close slice behavior
low-code: add backward compatibility for old close slice behavior
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

were changes to this file intentional?

Copy link
Contributor Author

@pnilan pnilan Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were unintentional -- I didn't realize there was a Publish CDK action so I had previously manually updated the changelog/pyproject, then reverted the changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok please only revert changes to this file. It'll be updated as part of the publish action

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.


## 0.78.5
low-code: fix stop_condition instantiation in the cursor pagination
Expand All @@ -43,7 +46,7 @@ low-code: Add last_record and last_page_size interpolation variables to paginati
Fix dependencies for file-based extras

## 0.78.2
low-code: fix retrieving partition key for legacy state migration
low-code: fix retrieving partition key for legacy state migration

## 0.78.1
connector-builder: return full url-encoded URL instead of separating parameters
Expand Down Expand Up @@ -351,7 +354,7 @@ File CDK: Avoid listing all files for check command
Vector DB CDK: Expose stream identifier logic, add field remapping to processing | File CDK: Emit analytics message for used streams

## 0.51.40
Add filters for base64 encode and decode in Jinja Interpolation
Add filters for base64 encode and decode in Jinja Interpolation

## 0.51.39
Few bug fixes for concurrent cdk
Expand Down Expand Up @@ -680,8 +683,8 @@ Publishing Docker image for source-declarative-manifest
## 0.29.0
**Breaking changes: We have promoted the low-code CDK to Beta. This release contains a number of breaking changes intended to improve the overall usability of the language by reorganizing certain concepts, renaming, reducing some field duplication, and removal of fields that are seldom used.**

The changes are:
* Deprecated the concept of Stream Slicers in favor of two individual concepts: Incremental Syncs, and Partition Routers:
The changes are:
* Deprecated the concept of Stream Slicers in favor of two individual concepts: Incremental Syncs, and Partition Routers:
* Stream will define an `incremental_sync` field which is responsible for defining how the connector should support incremental syncs using a cursor field. `DatetimeStreamSlicer` has been renamed to `DatetimeBasedCursor` and can be used for this field.
* `Retriever`s will now define a `partition_router` field. The remaining slicers are now called `SubstreamPartitionRouter` and `ListPartitionRouter`, both of which can be used here as they already have been.
* The `CartesianProductStreamSlicer` because `partition_router` can accept a list of values and will generate that same cartesian product by default.
Expand Down Expand Up @@ -860,7 +863,7 @@ Low-code: Fix a few bugs with the stream slicers
Low-code: Add support for custom error messages on error response filters

## 0.3.0
Publish python typehints via `py.typed` file.
Publish python typehints via `py.typed` file.

## 0.2.3
- Propagate options to InterpolatedRequestInputProvider
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,9 @@
#

from airbyte_cdk.sources.declarative.auth.oauth import DeclarativeOauth2Authenticator
from airbyte_cdk.sources.declarative.auth.jwt import JwtAuthenticator

__all__ = [
"DeclarativeOauth2Authenticator",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why SelectiveAuthenticator isn't here.

Copy link
Contributor Author

@pnilan pnilan Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a good answer for you. Oauth is re-exported via __init__ but all other authenticators are accessed directly from their files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not intentional. It shouldn't matter as these classes should only be used through the YAML interface

"JwtAuthenticator"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, should this also be DeclarativeJWTAuthenticator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the oauth2 auth would be the outlier: ApiKeyAuthenticator, BearerAuthenticator, BasicHttpAuthenticator, etc don't explicitly note declarative.

]
102 changes: 102 additions & 0 deletions airbyte-cdk/python/airbyte_cdk/sources/declarative/auth/jwt.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
#
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
#

from dataclasses import InitVar, dataclass
from datetime import datetime
from typing import Any, Mapping, Optional, Union

import jwt
from airbyte_cdk.sources.declarative.auth.declarative_authenticator import DeclarativeAuthenticator
from airbyte_cdk.sources.declarative.interpolation.interpolated_mapping import InterpolatedMapping
from airbyte_cdk.sources.declarative.interpolation.interpolated_string import InterpolatedString


@dataclass
class JwtAuthenticator(DeclarativeAuthenticator):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea what our naming convention is — do we want acronyms to be all caps? I.e. JWTAuthenticator or JwtAuthenticator?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not attached to it but I find JwtAuthenticator a bit easier to read

Copy link
Contributor Author

@pnilan pnilan Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed ApiKeyAuthenticator's lead.


config: Mapping[str, Any]
parameters: InitVar[Mapping[str, Any]]
secret_key: Union[InterpolatedString, str]
algorithm: Union[InterpolatedString, str]
token_duration: Union[InterpolatedString, str] = 1200
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be a union[interpolated string, str, int]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this to make token duration only an integer.

kid: Union[InterpolatedString, str] = None
typ: Union[InterpolatedString, str] = "JWT"
iss: Union[InterpolatedString, str] = None
sub: Union[InterpolatedString, str] = None
aud: Union[InterpolatedString, str] = None
cty: Union[InterpolatedString, str] = None
additional_jwt_headers: Mapping[str, Any] = None
additional_jwt_payload: Mapping[str, Any] = None

def __post_init__(self, parameters: Mapping[str, Any]) -> None:
super().__init__()
self._algorithm = InterpolatedString.create(self.algorithm, parameters=parameters)
self._secret_key = InterpolatedString.create(self.secret_key, parameters=parameters)
self._kid = InterpolatedString.create(self.kid, parameters=parameters)
self._typ = InterpolatedString.create(self.typ, parameters=parameters)
self._iss = InterpolatedString.create(self.iss, parameters=parameters)
self._sub = InterpolatedString.create(self.sub, parameters=parameters)
self._aud = InterpolatedString.create(self.aud, parameters=parameters)
self._cty = InterpolatedString.create(self.cty, parameters=parameters)
self._token_duration = self.token_duration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably also need to call InterpolatedString.create since self.token_duration can also be a string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated this to make token duration only an integer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this means the value can't be interpolated. Is that ok?

self._additional_jwt_headers = InterpolatedMapping(self.additional_jwt_headers or {}, parameters=parameters)
self._additional_jwt_payload = InterpolatedMapping(self.additional_jwt_payload or {}, parameters=parameters)

def _get_jwt_headers(self) -> Mapping[str, Any]:
headers = {}
headers.update(self._additional_jwt_headers.eval(self.config))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: these two lines could just be
headers = self._additional_jwt_headers.eval(self.config)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorporated.

if self._kid:
headers["kid"] = f"{self._kid.eval(self.config)}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we fail if "kid" or other required params was also defined in the additional jwt headers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Added a check for both jwt_headers and jwt_payload.

if self._algorithm:
headers["alg"] = self._get_algorithm()
if self._typ:
headers["typ"] = f"{self._typ.eval(self.config)}"
if self._cty:
headers["cty"] = f"{self._cty.eval(self.config)}"
return headers

def _get_jwt_payload(self) -> Mapping[str, Any]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment explaining how this works?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@girarda Sorry, can you clarify what you're requesting? An comment within the code explaining how the _get_jwt_payload works? Or for each method?

payload = {}
now = int(datetime.now().timestamp())
exp = now + self._token_duration
nbf = now
payload.update(self._additional_jwt_payload.eval(self.config))
if self._iss:
payload["iss"] = f"{self._iss.eval(self.config)}"
if self._sub:
payload["sub"] = f"{self._sub.eval(self.config)}"
if self._aud:
payload["aud"] = f"{self._aud.eval(self.config)}"
payload["iat"] = now
payload["exp"] = exp
payload["nbf"] = nbf
return payload

def _get_algorithm(self) -> str:
algorithm: str = self._algorithm.eval(self.config)
if not algorithm:
raise ValueError("Algorithm is required")
return f"{algorithm}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return f"{algorithm}"
return algorithm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.


def _get_secret_key(self) -> str:
secret_key: str = self._secret_key.eval(self.config)
if not secret_key:
raise ValueError("secret_key is required")
return f"{secret_key}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return f"{secret_key}"
return secret_key

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.


def _get_signed_token(self) -> str:
return jwt.encode(
payload=self._get_jwt_payload(),
key=self._get_secret_key(),
algorithm=self._get_algorithm(),
headers=self._get_jwt_headers(),
)

@property
def auth_header(self) -> str:
return "Authorization"

@property
def token(self) -> str:
return f"Bearer {self._get_signed_token()}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the prefix guaranteed to always be Bearer or should we make this configurable to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. It appears Bearer is not required but is common. The prefix can even be empty. I've added a header_prefix property to the auth schema.

Original file line number Diff line number Diff line change
Expand Up @@ -257,13 +257,15 @@ definitions:
- "$ref": "#/definitions/BearerAuthenticator"
- "$ref": "#/definitions/CustomAuthenticator"
- "$ref": "#/definitions/OAuthAuthenticator"
- "$ref": "#/definitions/JwtAuthenticator"
- "$ref": "#/definitions/NoAuth"
- "$ref": "#/definitions/SessionTokenAuthenticator"
- "$ref": "#/definitions/LegacySessionTokenAuthenticator"
examples:
- authenticators:
token: "#/definitions/ApiKeyAuthenticator"
oauth: "#/definitions/OAuthAuthenticator"
jwt: "#/definitions/JwtAuthenticator"
$parameters:
type: object
additionalProperties: true
Expand Down Expand Up @@ -833,6 +835,95 @@ definitions:
$parameters:
type: object
additionalProperties: true
JwtAuthenticator:
title: JWT Authenticator
description: Authenticator for requests using JWT authentication flow.
type: object
required:
- type
- secret_key
- algorithm
- jwt_headers
- jwt_payload
properties:
type:
type: string
enum: [JwtAuthenticator]
secret_key:
type: string
description: Secret used to sign the JSON web token.
examples:
- "{{ config['secret_key'] }}"
algorithm:
type: string
description: Algorithm used to sign the JSON web token.
examples:
- "ES256"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be the default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe so... I haven't found any consistency in the algorithm that APIs use:

token_duration:
type: integer
title: Token Duration
description: The amount of time in seconds a JWT token can be valid after being issued.
examples:
- 1200
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a common default value we can use here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen any consistency, although 1200s (20 min) is the smallest duration I've seen -- maybe a good default.

That being said, does this mean if a sync takes longer than 20 minutes it would fail? Should I include some sort of refresh mechanism? Or, is the authenticator re-instantiated per read?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question! can we refresh the token at runtime? this is how we do it for oauth authenticators

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@girarda Actually I believe we're good -- each time the the request is prepared it will invoke _get_jwt_headers which will "refresh" the expiration time (and therefore refresh the token, as the token is the headers, payload, secret_key all encoded into a single string).

jwt_headers:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are any of those properties required?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but they are the most common properties.

type: object
title: JWT Headers
description: JWT headers used when signing JSON web token.
additionalProperties: false
properties:
kid:
type: string
title: Key Identifier
description: Private key ID for user account.
examples:
- "{{ config['kid'] }}"
typ:
type: string
title: Type
description: The media type of the complete JWT.
examples:
- JWT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's set JWT as a default value

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Incorporated.

cty:
type: string
title: Content Type
description: Content type of JWT header.
examples:
- JWT
additional_jwt_headers:
type: object
title: Additional JWT Headers
description: Additional headers to be included with the JWT headers object.
additionalProperties: true
jwt_payload:
type: object
title: JWT Payload
description: JWT Payload used when signing JSON web token.
additionalProperties: false
properties:
iss:
type: string
title: Issuer
description: The user/principal that issued the JWT. Commonly a value unique to the user.
examples:
- "{{ config['iss'] }}"
sub:
type: string
title: Subject
description: The subject of the JWT. Commonly defined by the API.
aud:
type: string
title: Audience
description: The recipient that the JWT is intended for. Commonly defined by the API.
examples:
- "appstoreconnect-v1"
additional_jwt_payload:
type: object
title: Additional JWT Payload Properties
description: Additional properties to be added to the JWT payload.
additionalProperties: true
$parameters:
type: object
additionalProperties: true
OAuthAuthenticator:
title: OAuth2
description: Authenticator for requests using OAuth 2.0 authorization flow.
Expand Down Expand Up @@ -1311,6 +1402,7 @@ definitions:
- "$ref": "#/definitions/BearerAuthenticator"
- "$ref": "#/definitions/CustomAuthenticator"
- "$ref": "#/definitions/OAuthAuthenticator"
- "$ref": "#/definitions/JwtAuthenticator"
- "$ref": "#/definitions/NoAuth"
- "$ref": "#/definitions/SessionTokenAuthenticator"
- "$ref": "#/definitions/LegacySessionTokenAuthenticator"
Expand Down
Loading
Loading