Skip to content

Commit 69e4eba

Browse files
committed
🐙 octavia-cli: implement secrement management
1 parent 706d7f1 commit 69e4eba

File tree

14 files changed

+149
-70
lines changed

14 files changed

+149
-70
lines changed

octavia-cli/README.md

+13-3
Original file line numberDiff line numberDiff line change
@@ -21,13 +21,22 @@ SUB_BUILD=OCTAVIA_CLI ./gradlew build #from the root of the repo
2121
2. Run the CLI from docker:
2222
```bash
2323
docker run airbyte/octavia-cli:dev
24-
````
24+
```
2525
3. Create an `octavia` alias in your `.bashrc` or `.zshrc`:
26-
````bash
26+
```bash
2727
echo 'alias octavia="docker run airbyte/octavia-cli:dev"' >> ~/.zshrc
2828
source ~/.zshrc
2929
octavia
30-
````
30+
```
31+
32+
# Secret management
33+
Sources and destinations configurations have credential fields that you **do not want to store as plain text and version on Git**.
34+
`octavia` offers secret management through environment variables expansion:
35+
```yaml
36+
configuration:
37+
password: ${MY_PASSWORD}
38+
```
39+
If you have set a `MY_PASSWORD` environment variable, `octavia apply` will load its value into the `password` field.
3140

3241
# Current development status
3342
Octavia is currently under development.
@@ -38,6 +47,7 @@ We welcome community contributions!
3847

3948
| Date | Milestone |
4049
|------------|-------------------------------------|
50+
| 2022-03-06 | Implement secret management through environment variable expansion |
4151
| 2022-03-02 | Implement `octavia apply` (sources and destination only)|
4252
| 2022-02-06 | Implement `octavia generate` (sources and destination only)|
4353
| 2022-01-25 | Implement `octavia init` + some context checks|

octavia-cli/integration_tests/test_generate/expected_rendered_yaml/destination_postgres/expected.yaml

+3-3
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ configuration:
1313
database: # REQUIRED | string | Name of the database.
1414
schema: "public" # REQUIRED | string | The default schema tables are written to if the source does not specify a namespace. The usual value for this field is "public". | Example: public
1515
username: # REQUIRED | string | Username to use to access the database.
16-
password: # SECRET | OPTIONAL | string | Password associated with the username.
16+
password: ${PASSWORD} # SECRET (please store in environment variables) | OPTIONAL | string | Password associated with the username.
1717
ssl: # OPTIONAL | boolean | Encrypt data using SSL.
1818
tunnel_method:
1919
## -------- Pick one valid structure among the examples below: --------
@@ -23,10 +23,10 @@ configuration:
2323
# tunnel_host: # REQUIRED | string | Hostname of the jump server host that allows inbound ssh tunnel.
2424
# tunnel_port: 22 # REQUIRED | integer | Port on the proxy/jump server that accepts inbound ssh connections. | Example: 22
2525
# tunnel_user: # REQUIRED | string | OS-level username for logging into the jump server host.
26-
# ssh_key: # SECRET | REQUIRED | string | OS-level user account ssh key credentials in RSA PEM format ( created with ssh-keygen -t rsa -m PEM -f myuser_rsa )
26+
# ssh_key: ${SSH_KEY} # SECRET (please store in environment variables) | REQUIRED | string | OS-level user account ssh key credentials in RSA PEM format ( created with ssh-keygen -t rsa -m PEM -f myuser_rsa )
2727
## -------- Another valid structure for tunnel_method: --------
2828
# tunnel_method: "SSH_PASSWORD_AUTH" # REQUIRED | string | Connect through a jump server tunnel host using username and password authentication
2929
# tunnel_host: # REQUIRED | string | Hostname of the jump server host that allows inbound ssh tunnel.
3030
# tunnel_port: 22 # REQUIRED | integer | Port on the proxy/jump server that accepts inbound ssh connections. | Example: 22
3131
# tunnel_user: # REQUIRED | string | OS-level username for logging into the jump server host
32-
# tunnel_user_password: # SECRET | REQUIRED | string | OS-level password for logging into the jump server host
32+
# tunnel_user_password: ${TUNNEL_USER_PASSWORD} # SECRET (please store in environment variables) | REQUIRED | string | OS-level password for logging into the jump server host

octavia-cli/integration_tests/test_generate/expected_rendered_yaml/destination_s3/expected.yaml

+3-3
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,8 @@ configuration:
1212
s3_bucket_name: # REQUIRED | string | The name of the S3 bucket. | Example: airbyte_sync
1313
s3_bucket_path: # REQUIRED | string | Directory under the S3 bucket where data will be written. | Example: data_sync/test
1414
s3_bucket_region: # REQUIRED | string | The region of the S3 bucket.
15-
access_key_id: # SECRET | OPTIONAL | string | The access key id to access the S3 bucket. Airbyte requires Read and Write permissions to the given bucket, if not set, Airbyte will rely on Instance Profile. | Example: A012345678910EXAMPLE
16-
secret_access_key: # SECRET | OPTIONAL | string | The corresponding secret to the access key id, if S3 Key Id is set, then S3 Access Key must also be provided | Example: a012345678910ABCDEFGH/AbCdEfGhEXAMPLEKEY
15+
access_key_id: ${ACCESS_KEY_ID} # SECRET (please store in environment variables) | OPTIONAL | string | The access key id to access the S3 bucket. Airbyte requires Read and Write permissions to the given bucket, if not set, Airbyte will rely on Instance Profile. | Example: A012345678910EXAMPLE
16+
secret_access_key: ${SECRET_ACCESS_KEY} # SECRET (please store in environment variables) | OPTIONAL | string | The corresponding secret to the access key id, if S3 Key Id is set, then S3 Access Key must also be provided | Example: a012345678910ABCDEFGH/AbCdEfGhEXAMPLEKEY
1717
format:
1818
## -------- Pick one valid structure among the examples below: --------
1919
format_type: "Avro" # REQUIRED | string}
@@ -31,7 +31,7 @@ configuration:
3131
## -------- Another valid structure for compression_codec: --------
3232
# codec: "zstandard" # REQUIRED | string
3333
# compression_level: 3 # REQUIRED | integer | Negative levels are 'fast' modes akin to lz4 or snappy, levels above 9 are generally for archival purposes, and levels above 18 use a lot of memory.
34-
# include_checksum: # OPTIONAL | boolean | If true, include a checksum with each data block.
34+
# include_hash: # OPTIONAL | boolean | If true, include a hash with each data block.
3535
## -------- Another valid structure for compression_codec: --------
3636
# codec: "snappy" # REQUIRED | string
3737
part_size_mb: 5 # OPTIONAL | integer | This is the size of a "Part" being buffered in memory. It limits the memory usage when writing. Larger values will allow to upload a bigger files and improve the speed, but consumes9 more memory. Allowed values: min=5MB, max=525MB Default: 5MB. | Example: 5

octavia-cli/integration_tests/test_generate/expected_rendered_yaml/destination_s3/input_spec.yaml

+3-3
Original file line numberDiff line numberDiff line change
@@ -180,9 +180,9 @@ spec:
180180
default: 3
181181
minimum: -5
182182
maximum: 22
183-
include_checksum:
184-
title: "Include checksum"
185-
description: "If true, include a checksum with each data block."
183+
include_hash:
184+
title: "Include hash"
185+
description: "If true, include a hash with each data block."
186186
type: "boolean"
187187
default: false
188188
- title: "snappy"

octavia-cli/integration_tests/test_generate/expected_rendered_yaml/source_postgres/expected.yaml

+3-3
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ configuration:
1313
database: # REQUIRED | string | Name of the database.
1414
schemas: ["public"] # OPTIONAL | array | The list of schemas to sync from. Defaults to user. Case sensitive.
1515
username: # REQUIRED | string | Username to use to access the database.
16-
password: # SECRET | OPTIONAL | string | Password associated with the username.
16+
password: ${PASSWORD} # SECRET (please store in environment variables) | OPTIONAL | string | Password associated with the username.
1717
ssl: # OPTIONAL | boolean | Encrypt client/server communications for increased security.
1818
replication_method:
1919
## -------- Pick one valid structure among the examples below: --------
@@ -31,10 +31,10 @@ configuration:
3131
# tunnel_host: # REQUIRED | string | Hostname of the jump server host that allows inbound ssh tunnel.
3232
# tunnel_port: 22 # REQUIRED | integer | Port on the proxy/jump server that accepts inbound ssh connections. | Example: 22
3333
# tunnel_user: # REQUIRED | string | OS-level username for logging into the jump server host.
34-
# ssh_key: # SECRET | REQUIRED | string | OS-level user account ssh key credentials in RSA PEM format ( created with ssh-keygen -t rsa -m PEM -f myuser_rsa )
34+
# ssh_key: ${SSH_KEY} # SECRET (please store in environment variables) | REQUIRED | string | OS-level user account ssh key credentials in RSA PEM format ( created with ssh-keygen -t rsa -m PEM -f myuser_rsa )
3535
## -------- Another valid structure for tunnel_method: --------
3636
# tunnel_method: "SSH_PASSWORD_AUTH" # REQUIRED | string | Connect through a jump server tunnel host using username and password authentication
3737
# tunnel_host: # REQUIRED | string | Hostname of the jump server host that allows inbound ssh tunnel.
3838
# tunnel_port: 22 # REQUIRED | integer | Port on the proxy/jump server that accepts inbound ssh connections. | Example: 22
3939
# tunnel_user: # REQUIRED | string | OS-level username for logging into the jump server host
40-
# tunnel_user_password: # SECRET | REQUIRED | string | OS-level password for logging into the jump server host
40+
# tunnel_user_password: ${TUNNEL_USER_PASSWORD} # SECRET (please store in environment variables) | REQUIRED | string | OS-level password for logging into the jump server host

octavia-cli/octavia_cli/apply/diff_helpers.py

+7-12
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
#
44

55
import hashlib
6+
import json
67
from typing import Any
78

89
import click
@@ -11,23 +12,17 @@
1112
SECRET_MASK = "**********"
1213

1314

14-
def compute_checksum(file_path: str) -> str:
15-
"""Compute SHA256 checksum from a file
15+
def hash_config(configuration: dict) -> str:
16+
"""Computes a SHA256 hash from a dictionnary.
1617
1718
Args:
18-
file_path (str): Path for the file for which you want to compute a checksum.
19+
configuration (dict): The configuration to hash
1920
2021
Returns:
21-
str: The computed hash digest
22+
str: _description_
2223
"""
23-
BLOCK_SIZE = 65536
24-
file_hash = hashlib.sha256()
25-
with open(file_path, "rb") as f:
26-
fb = f.read(BLOCK_SIZE)
27-
while len(fb) > 0:
28-
file_hash.update(fb)
29-
fb = f.read(BLOCK_SIZE)
30-
return file_hash.hexdigest()
24+
stringified = json.dumps(configuration, sort_keys=True)
25+
return hashlib.sha256(stringified.encode("utf-8")).hexdigest()
3126

3227

3328
def exclude_secrets_from_diff(obj: Any, path: str) -> bool:

octavia-cli/octavia_cli/apply/resources.py

+20-17
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,8 @@
2323
from airbyte_api_client.model.source_update import SourceUpdate
2424
from click import ClickException
2525

26-
from .diff_helpers import compute_checksum, compute_diff
26+
from .diff_helpers import compute_diff, hash_config
27+
from .yaml_loaders import EnvVarLoader
2728

2829

2930
class DuplicateResourceError(ClickException):
@@ -39,27 +40,27 @@ class InvalidConfigurationError(ClickException):
3940

4041

4142
class ResourceState:
42-
def __init__(self, configuration_path: str, resource_id: str, generation_timestamp: int, configuration_checksum: str):
43+
def __init__(self, configuration_path: str, resource_id: str, generation_timestamp: int, configuration_hash: str):
4344
"""This constructor is meant to be private. Construction shall be made with create or from_file class methods.
4445
4546
Args:
46-
configuration_path (str): Path to the configuration path the state relates to.
47+
configuration_path (str): Path to the configuration this state relates to.
4748
resource_id (str): Id of the resource the state relates to.
4849
generation_timestamp (int): State generation timestamp.
49-
configuration_checksum (str): Checksum of the configuration file.
50+
configuration_hash (str): Checksum of the configuration file.
5051
"""
5152
self.configuration_path = configuration_path
5253
self.resource_id = resource_id
5354
self.generation_timestamp = generation_timestamp
54-
self.configuration_checksum = configuration_checksum
55+
self.configuration_hash = configuration_hash
5556
self.path = os.path.join(os.path.dirname(self.configuration_path), "state.yaml")
5657

5758
def as_dict(self):
5859
return {
59-
"configuration_path": self.configuration_path,
6060
"resource_id": self.resource_id,
6161
"generation_timestamp": self.generation_timestamp,
62-
"configuration_checksum": self.configuration_checksum,
62+
"configuration_path": self.configuration_path,
63+
"configuration_hash": self.configuration_hash,
6364
}
6465

6566
def _save(self) -> None:
@@ -68,19 +69,20 @@ def _save(self) -> None:
6869
yaml.dump(self.as_dict(), state_file)
6970

7071
@classmethod
71-
def create(cls, configuration_path: str, resource_id: str) -> "ResourceState":
72+
def create(cls, configuration_path: str, configuration: dict, resource_id: str) -> "ResourceState":
7273
"""Create a state for a resource configuration.
7374
7475
Args:
7576
configuration_path (str): Path to the YAML file defining the resource.
77+
configuration (dict): Configuration object that will be hashed.
7678
resource_id (str): UUID of the resource.
7779
7880
Returns:
7981
ResourceState: state representing the resource.
8082
"""
8183
generation_timestamp = int(time.time())
82-
configuration_checksum = compute_checksum(configuration_path)
83-
state = ResourceState(configuration_path, resource_id, generation_timestamp, configuration_checksum)
84+
configuration_hash = hash_config(configuration)
85+
state = ResourceState(configuration_path, resource_id, generation_timestamp, configuration_hash)
8486
state._save()
8587
return state
8688

@@ -100,7 +102,7 @@ def from_file(cls, file_path: str) -> "ResourceState":
100102
raw_state["configuration_path"],
101103
raw_state["resource_id"],
102104
raw_state["generation_timestamp"],
103-
raw_state["configuration_checksum"],
105+
raw_state["configuration_hash"],
104106
)
105107

106108

@@ -194,9 +196,7 @@ def __init__(
194196
self.configuration_path = configuration_path
195197
self.api_instance = self.api(api_client)
196198
self.state = self._get_state_from_file()
197-
self.local_file_changed = (
198-
True if self.state is None else compute_checksum(self.configuration_path) != self.state.configuration_checksum
199-
)
199+
self.local_file_changed = True if self.state is None else hash_config(self.local_configuration) != self.state.configuration_hash
200200

201201
@property
202202
def remote_resource(self):
@@ -274,7 +274,9 @@ def get_diff_with_remote_resource(self) -> str:
274274
return diff.pretty()
275275

276276
def _create_or_update(
277-
self, operation_fn: Callable, payload: Union[SourceCreate, SourceUpdate, DestinationCreate, DestinationUpdate]
277+
self,
278+
operation_fn: Callable,
279+
payload: Union[SourceCreate, SourceUpdate, DestinationCreate, DestinationUpdate],
278280
) -> Union[SourceRead, DestinationRead]:
279281
"""Wrapper to trigger create or update of remote resource.
280282
@@ -291,7 +293,7 @@ def _create_or_update(
291293
"""
292294
try:
293295
result = operation_fn(self.api_instance, payload)
294-
return result, ResourceState.create(self.configuration_path, result[self.resource_id_field])
296+
return result, ResourceState.create(self.configuration_path, self.local_configuration, result[self.resource_id_field])
295297
except airbyte_api_client.ApiException as api_error:
296298
if api_error.status == 422:
297299
# This API response error is really verbose, but it embodies all the details about why the config is not valid.
@@ -417,10 +419,11 @@ def factory(api_client: airbyte_api_client.ApiClient, workspace_id: str, configu
417419
Union[Source, Destination]: The resource object created from the YAML config.
418420
"""
419421
with open(configuration_path, "r") as f:
420-
local_configuration = yaml.load(f, yaml.FullLoader)
422+
local_configuration = yaml.load(f, EnvVarLoader)
421423
if local_configuration["definition_type"] == "source":
422424
return Source(api_client, workspace_id, local_configuration, configuration_path)
423425
if local_configuration["definition_type"] == "destination":
424426
return Destination(api_client, workspace_id, local_configuration, configuration_path)
427+
425428
else:
426429
raise NotImplementedError(f"Resource {local_configuration['definition_type']} was not yet implemented")
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
#
2+
# Copyright (c) 2021 Airbyte, Inc., all rights reserved.
3+
#
4+
5+
import os
6+
import re
7+
from typing import Any
8+
9+
import yaml
10+
11+
ENV_VAR_MATCHER_PATTERN = re.compile(r".*\$\{([^}^{]+)\}.*")
12+
13+
14+
def env_var_replacer(loader: yaml.Loader, node: yaml.Node) -> Any:
15+
"""Convert a YAML node to a Python object, expanding variable.
16+
17+
Args:
18+
loader (yaml.Loader): Not used
19+
node (yaml.Node): Yaml node to convert to python object
20+
21+
Returns:
22+
Any: Python object with expanded vars.
23+
"""
24+
return os.path.expandvars(node.value)
25+
26+
27+
class EnvVarLoader(yaml.SafeLoader):
28+
pass
29+
30+
31+
# All yaml nodes matching the regex will be tagged as !environment_variable.
32+
EnvVarLoader.add_implicit_resolver("!environment_variable", ENV_VAR_MATCHER_PATTERN, None)
33+
34+
# All yaml nodes tagged as !environment_variable will be constructed with the env_var_replacer callback.
35+
EnvVarLoader.add_constructor("!environment_variable", env_var_replacer)

octavia-cli/octavia_cli/generate/renderer.py

+3-1
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ def _get_type_comment(self) -> str:
8989
return self.type if self.type else None
9090

9191
def _get_secret_comment(self) -> str:
92-
return "SECRET" if self.airbyte_secret else None
92+
return "SECRET (please store in environment variables)" if self.airbyte_secret else None
9393

9494
def _get_description_comment(self) -> str:
9595
return self.description if self.description else None
@@ -109,6 +109,8 @@ def _get_example_comment(self) -> str:
109109
def _get_default(self) -> str:
110110
if self.const:
111111
return self.const
112+
if self.airbyte_secret:
113+
return f"${{{self.name.upper()}}}"
112114
return self.default
113115

114116
@staticmethod

octavia-cli/octavia_cli/templates/source_or_destination.yaml.j2

+1-1
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ definition_image: {{ definition.docker_repository }}
77
definition_version: {{ definition.docker_image_tag }}
88

99
{%- macro render_field(field, is_commented) %}
10-
{%- if is_commented %}# {% endif %}{{ field.name }}:{% if field.default %} {{ field.default | tojson() }}{% endif %} # {{ field.comment }}
10+
{%- if is_commented %}# {% endif %}{{ field.name }}:{% if field.default %} {% if field.airbyte_secret %}{{ field.default }}{% else %}{{ field.default | tojson() }}{% endif %}{% endif %} # {{ field.comment }}
1111
{%- endmacro %}
1212

1313
{%- macro render_sub_fields(sub_fields, is_commented) %}

octavia-cli/unit_tests/test_apply/test_diff_helpers.py

+3-7
Original file line numberDiff line numberDiff line change
@@ -2,17 +2,13 @@
22
# Copyright (c) 2021 Airbyte, Inc., all rights reserved.
33
#
44

5-
from unittest.mock import mock_open, patch
6-
75
import pytest
86
from octavia_cli.apply import diff_helpers
97

108

11-
def test_compute_checksum(mocker):
12-
with patch("builtins.open", mock_open(read_data=b"data")) as mock_file:
13-
digest = diff_helpers.compute_checksum("test_file_path")
14-
assert digest == "3a6eb0790f39ac87c94f3856b2dd2c5d110e6811602261a9a923d3bb23adc8b7"
15-
mock_file.assert_called_with("test_file_path", "rb")
9+
def test_hash_config():
10+
data_to_hash = {"example": "foo"}
11+
assert diff_helpers.hash_config(data_to_hash) == "8d621bd700ff9a864bc603f56b4ec73536110b37d814dd4629767e898da70bef"
1612

1713

1814
@pytest.mark.parametrize(

0 commit comments

Comments
 (0)