Skip to content

🐙 octavia-cli: secret management #10885

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 9, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion octavia-cli/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,22 @@ Octavia is currently under development.
You can find a detailed and updated execution plan [here](https://docs.google.com/spreadsheets/d/1weB9nf0Zx3IR_QvpkxtjBAzyfGb7B0PWpsVt6iMB5Us/edit#gid=0).
We welcome community contributions!

# Secret management
Sources and destinations configurations have credential fields that you **do not want to store as plain text and version on Git**.
`octavia` offers secret management through environment variables expansion:
```yaml
configuration:
password: ${MY_PASSWORD}
```
If you have set a `MY_PASSWORD` environment variable, `octavia apply` will load its value into the `password` field.

**Summary of achievements**:

| Date | Milestone |
|------------|-------------------------------------|
| 2022-03-04 | Implement `octavia apply` for connections|
| 2022-03-09 | Implement secret management through environment variable expansion |
| 2022-03-09 | Implement `octavia generate connection`|
| 2022-03-09 | Implement `octavia apply` for connections|
| 2022-03-02 | Implement `octavia apply` (sources and destination only)|
| 2022-02-06 | Implement `octavia generate` (sources and destination only)|
| 2022-01-25 | Implement `octavia init` + some context checks|
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ configuration:
database: # REQUIRED | string | Name of the database.
schema: "public" # REQUIRED | string | The default schema tables are written to if the source does not specify a namespace. The usual value for this field is "public". | Example: public
username: # REQUIRED | string | Username to use to access the database.
password: # SECRET | OPTIONAL | string | Password associated with the username.
password: ${PASSWORD} # SECRET (please store in environment variables) | OPTIONAL | string | Password associated with the username.
ssl: # OPTIONAL | boolean | Encrypt data using SSL.
tunnel_method:
## -------- Pick one valid structure among the examples below: --------
Expand All @@ -23,10 +23,10 @@ configuration:
# tunnel_host: # REQUIRED | string | Hostname of the jump server host that allows inbound ssh tunnel.
# tunnel_port: 22 # REQUIRED | integer | Port on the proxy/jump server that accepts inbound ssh connections. | Example: 22
# tunnel_user: # REQUIRED | string | OS-level username for logging into the jump server host.
# ssh_key: # SECRET | REQUIRED | string | OS-level user account ssh key credentials in RSA PEM format ( created with ssh-keygen -t rsa -m PEM -f myuser_rsa )
# ssh_key: ${SSH_KEY} # SECRET (please store in environment variables) | REQUIRED | string | OS-level user account ssh key credentials in RSA PEM format ( created with ssh-keygen -t rsa -m PEM -f myuser_rsa )
## -------- Another valid structure for tunnel_method: --------
# tunnel_method: "SSH_PASSWORD_AUTH" # REQUIRED | string | Connect through a jump server tunnel host using username and password authentication
# tunnel_host: # REQUIRED | string | Hostname of the jump server host that allows inbound ssh tunnel.
# tunnel_port: 22 # REQUIRED | integer | Port on the proxy/jump server that accepts inbound ssh connections. | Example: 22
# tunnel_user: # REQUIRED | string | OS-level username for logging into the jump server host
# tunnel_user_password: # SECRET | REQUIRED | string | OS-level password for logging into the jump server host
# tunnel_user_password: ${TUNNEL_USER_PASSWORD} # SECRET (please store in environment variables) | REQUIRED | string | OS-level password for logging into the jump server host
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ configuration:
s3_bucket_name: # REQUIRED | string | The name of the S3 bucket. | Example: airbyte_sync
s3_bucket_path: # REQUIRED | string | Directory under the S3 bucket where data will be written. | Example: data_sync/test
s3_bucket_region: # REQUIRED | string | The region of the S3 bucket.
access_key_id: # SECRET | OPTIONAL | string | The access key id to access the S3 bucket. Airbyte requires Read and Write permissions to the given bucket, if not set, Airbyte will rely on Instance Profile. | Example: A012345678910EXAMPLE
secret_access_key: # SECRET | OPTIONAL | string | The corresponding secret to the access key id, if S3 Key Id is set, then S3 Access Key must also be provided | Example: a012345678910ABCDEFGH/AbCdEfGhEXAMPLEKEY
access_key_id: ${ACCESS_KEY_ID} # SECRET (please store in environment variables) | OPTIONAL | string | The access key id to access the S3 bucket. Airbyte requires Read and Write permissions to the given bucket, if not set, Airbyte will rely on Instance Profile. | Example: A012345678910EXAMPLE
secret_access_key: ${SECRET_ACCESS_KEY} # SECRET (please store in environment variables) | OPTIONAL | string | The corresponding secret to the access key id, if S3 Key Id is set, then S3 Access Key must also be provided | Example: a012345678910ABCDEFGH/AbCdEfGhEXAMPLEKEY
format:
## -------- Pick one valid structure among the examples below: --------
format_type: "Avro" # REQUIRED | string}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ configuration:
database: # REQUIRED | string | Name of the database.
schemas: ["public"] # OPTIONAL | array | The list of schemas to sync from. Defaults to user. Case sensitive.
username: # REQUIRED | string | Username to use to access the database.
password: # SECRET | OPTIONAL | string | Password associated with the username.
password: ${PASSWORD} # SECRET (please store in environment variables) | OPTIONAL | string | Password associated with the username.
ssl: # OPTIONAL | boolean | Encrypt client/server communications for increased security.
replication_method:
## -------- Pick one valid structure among the examples below: --------
Expand All @@ -31,10 +31,10 @@ configuration:
# tunnel_host: # REQUIRED | string | Hostname of the jump server host that allows inbound ssh tunnel.
# tunnel_port: 22 # REQUIRED | integer | Port on the proxy/jump server that accepts inbound ssh connections. | Example: 22
# tunnel_user: # REQUIRED | string | OS-level username for logging into the jump server host.
# ssh_key: # SECRET | REQUIRED | string | OS-level user account ssh key credentials in RSA PEM format ( created with ssh-keygen -t rsa -m PEM -f myuser_rsa )
# ssh_key: ${SSH_KEY} # SECRET (please store in environment variables) | REQUIRED | string | OS-level user account ssh key credentials in RSA PEM format ( created with ssh-keygen -t rsa -m PEM -f myuser_rsa )
## -------- Another valid structure for tunnel_method: --------
# tunnel_method: "SSH_PASSWORD_AUTH" # REQUIRED | string | Connect through a jump server tunnel host using username and password authentication
# tunnel_host: # REQUIRED | string | Hostname of the jump server host that allows inbound ssh tunnel.
# tunnel_port: 22 # REQUIRED | integer | Port on the proxy/jump server that accepts inbound ssh connections. | Example: 22
# tunnel_user: # REQUIRED | string | OS-level username for logging into the jump server host
# tunnel_user_password: # SECRET | REQUIRED | string | OS-level password for logging into the jump server host
# tunnel_user_password: ${TUNNEL_USER_PASSWORD} # SECRET (please store in environment variables) | REQUIRED | string | OS-level password for logging into the jump server host
19 changes: 7 additions & 12 deletions octavia-cli/octavia_cli/apply/diff_helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
#

import hashlib
import json
from typing import Any

import click
Expand All @@ -11,23 +12,17 @@
SECRET_MASK = "**********"


def compute_checksum(file_path: str) -> str:
"""Compute SHA256 checksum from a file
def hash_config(configuration: dict) -> str:
"""Computes a SHA256 hash from a dictionnary.

Args:
file_path (str): Path for the file for which you want to compute a checksum.
configuration (dict): The configuration to hash

Returns:
str: The computed hash digest
str: _description_
"""
BLOCK_SIZE = 65536
file_hash = hashlib.sha256()
with open(file_path, "rb") as f:
fb = f.read(BLOCK_SIZE)
while len(fb) > 0:
file_hash.update(fb)
fb = f.read(BLOCK_SIZE)
return file_hash.hexdigest()
stringified = json.dumps(configuration, sort_keys=True)
return hashlib.sha256(stringified.encode("utf-8")).hexdigest()


def exclude_secrets_from_diff(obj: Any, path: str) -> bool:
Expand Down
32 changes: 16 additions & 16 deletions octavia-cli/octavia_cli/apply/resources.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,8 @@
from airbyte_api_client.model.source_update import SourceUpdate
from click import ClickException

from .diff_helpers import compute_checksum, compute_diff
from .diff_helpers import compute_diff, hash_config
from .yaml_loaders import EnvVarLoader


class DuplicateResourceError(ClickException):
Expand All @@ -48,27 +49,27 @@ class InvalidConfigurationError(ClickException):


class ResourceState:
def __init__(self, configuration_path: str, resource_id: str, generation_timestamp: int, configuration_checksum: str):
def __init__(self, configuration_path: str, resource_id: str, generation_timestamp: int, configuration_hash: str):
"""This constructor is meant to be private. Construction shall be made with create or from_file class methods.

Args:
configuration_path (str): Path to the configuration path the state relates to.
configuration_path (str): Path to the configuration this state relates to.
resource_id (str): Id of the resource the state relates to.
generation_timestamp (int): State generation timestamp.
configuration_checksum (str): Checksum of the configuration file.
configuration_hash (str): Checksum of the configuration file.
"""
self.configuration_path = configuration_path
self.resource_id = resource_id
self.generation_timestamp = generation_timestamp
self.configuration_checksum = configuration_checksum
self.configuration_hash = configuration_hash
self.path = os.path.join(os.path.dirname(self.configuration_path), "state.yaml")

def as_dict(self):
return {
"configuration_path": self.configuration_path,
"resource_id": self.resource_id,
"generation_timestamp": self.generation_timestamp,
"configuration_checksum": self.configuration_checksum,
"configuration_path": self.configuration_path,
"configuration_hash": self.configuration_hash,
}

def _save(self) -> None:
Expand All @@ -77,19 +78,20 @@ def _save(self) -> None:
yaml.dump(self.as_dict(), state_file)

@classmethod
def create(cls, configuration_path: str, resource_id: str) -> "ResourceState":
def create(cls, configuration_path: str, configuration: dict, resource_id: str) -> "ResourceState":
"""Create a state for a resource configuration.

Args:
configuration_path (str): Path to the YAML file defining the resource.
configuration (dict): Configuration object that will be hashed.
resource_id (str): UUID of the resource.

Returns:
ResourceState: state representing the resource.
"""
generation_timestamp = int(time.time())
configuration_checksum = compute_checksum(configuration_path)
state = ResourceState(configuration_path, resource_id, generation_timestamp, configuration_checksum)
configuration_hash = hash_config(configuration)
state = ResourceState(configuration_path, resource_id, generation_timestamp, configuration_hash)
state._save()
return state

Expand All @@ -109,7 +111,7 @@ def from_file(cls, file_path: str) -> "ResourceState":
raw_state["configuration_path"],
raw_state["resource_id"],
raw_state["generation_timestamp"],
raw_state["configuration_checksum"],
raw_state["configuration_hash"],
)


Expand Down Expand Up @@ -198,9 +200,7 @@ def __init__(
self.configuration_path = configuration_path
self.api_instance = self.api(api_client)
self.state = self._get_state_from_file()
self.local_file_changed = (
True if self.state is None else compute_checksum(self.configuration_path) != self.state.configuration_checksum
)
self.local_file_changed = True if self.state is None else hash_config(self.local_configuration) != self.state.configuration_hash

@property
def remote_resource(self):
Expand Down Expand Up @@ -308,7 +308,7 @@ def _create_or_update(
"""
try:
result = operation_fn(self.api_instance, payload, _check_return_type=_check_return_type)
return result, ResourceState.create(self.configuration_path, result[self.resource_id_field])
return result, ResourceState.create(self.configuration_path, self.local_configuration, result[self.resource_id_field])
except airbyte_api_client.ApiException as api_error:
if api_error.status == 422:
# This API response error is really verbose, but it embodies all the details about why the config is not valid.
Expand Down Expand Up @@ -554,7 +554,7 @@ def factory(api_client: airbyte_api_client.ApiClient, workspace_id: str, configu
Union[Source, Destination, Connection]: The resource object created from the YAML config.
"""
with open(configuration_path, "r") as f:
local_configuration = yaml.safe_load(f)
local_configuration = yaml.load(f, EnvVarLoader)
if local_configuration["definition_type"] == "source":
return Source(api_client, workspace_id, local_configuration, configuration_path)
if local_configuration["definition_type"] == "destination":
Expand Down
35 changes: 35 additions & 0 deletions octavia-cli/octavia_cli/apply/yaml_loaders.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#
# Copyright (c) 2021 Airbyte, Inc., all rights reserved.
#

import os
import re
from typing import Any

import yaml

ENV_VAR_MATCHER_PATTERN = re.compile(r".*\$\{([^}^{]+)\}.*")


def env_var_replacer(loader: yaml.Loader, node: yaml.Node) -> Any:
"""Convert a YAML node to a Python object, expanding variable.

Args:
loader (yaml.Loader): Not used
node (yaml.Node): Yaml node to convert to python object

Returns:
Any: Python object with expanded vars.
"""
return os.path.expandvars(node.value)


class EnvVarLoader(yaml.SafeLoader):
pass


# All yaml nodes matching the regex will be tagged as !environment_variable.
EnvVarLoader.add_implicit_resolver("!environment_variable", ENV_VAR_MATCHER_PATTERN, None)

# All yaml nodes tagged as !environment_variable will be constructed with the env_var_replacer callback.
EnvVarLoader.add_constructor("!environment_variable", env_var_replacer)
4 changes: 3 additions & 1 deletion octavia-cli/octavia_cli/generate/renderers.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ def _get_type_comment(self) -> str:
return self.type if self.type else None

def _get_secret_comment(self) -> str:
return "SECRET" if self.airbyte_secret else None
return "SECRET (please store in environment variables)" if self.airbyte_secret else None

def _get_description_comment(self) -> str:
return self.description if self.description else None
Expand All @@ -113,6 +113,8 @@ def _get_example_comment(self) -> str:
def _get_default(self) -> str:
if self.const:
return self.const
if self.airbyte_secret:
return f"${{{self.name.upper()}}}"
return self.default

@staticmethod
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ definition_image: {{ definition.docker_repository }}
definition_version: {{ definition.docker_image_tag }}

{%- macro render_field(field, is_commented) %}
{%- if is_commented %}# {% endif %}{{ field.name }}:{% if field.default %} {{ field.default | tojson() }}{% endif %} # {{ field.comment }}
{%- if is_commented %}# {% endif %}{{ field.name }}:{% if field.default %} {% if field.airbyte_secret %}{{ field.default }}{% else %}{{ field.default | tojson() }}{% endif %}{% endif %} # {{ field.comment }}
{%- endmacro %}

{%- macro render_sub_fields(sub_fields, is_commented) %}
Expand Down
10 changes: 3 additions & 7 deletions octavia-cli/unit_tests/test_apply/test_diff_helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,13 @@
# Copyright (c) 2021 Airbyte, Inc., all rights reserved.
#

from unittest.mock import mock_open, patch

import pytest
from octavia_cli.apply import diff_helpers


def test_compute_checksum(mocker):
with patch("builtins.open", mock_open(read_data=b"data")) as mock_file:
digest = diff_helpers.compute_checksum("test_file_path")
assert digest == "3a6eb0790f39ac87c94f3856b2dd2c5d110e6811602261a9a923d3bb23adc8b7"
mock_file.assert_called_with("test_file_path", "rb")
def test_hash_config():
data_to_hash = {"example": "foo"}
assert diff_helpers.hash_config(data_to_hash) == "8d621bd700ff9a864bc603f56b4ec73536110b37d814dd4629767e898da70bef"


@pytest.mark.parametrize(
Expand Down
Loading