Skip to content

🎉 Migrate config persistence to database #4670

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 45 commits into from
Jul 19, 2021
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
6ec78f2
Implement db config persistence
tuliren Jul 9, 2021
947d03a
Fix database readiness check
tuliren Jul 9, 2021
e83c373
Reduce logging noise
tuliren Jul 9, 2021
dc86c62
Setup config database in config persistence factory
tuliren Jul 12, 2021
648ce17
Update documentation
tuliren Jul 13, 2021
e72843c
Load seed from yaml files
tuliren Jul 14, 2021
1067aa4
Refactor config persistence factory
tuliren Jul 14, 2021
607bf07
Add one more test to mimic migration
tuliren Jul 14, 2021
33ed1cd
Remove unnecessary changes
tuliren Jul 14, 2021
19a8f2e
Run code formatter
tuliren Jul 14, 2021
3a9db90
Merge branch 'master' into liren/db-config-persistence-v2
tuliren Jul 14, 2021
6fb38bf
Update placeholder env values
tuliren Jul 14, 2021
576f28d
Set default config database parameters in docker compose
tuliren Jul 15, 2021
8b85a03
Default setupDatabase to false
tuliren Jul 15, 2021
d62a72b
Rename variable
tuliren Jul 15, 2021
d7e4815
Set default config db parameters for server
tuliren Jul 15, 2021
91f2f77
Remove config db parameters from the env file
tuliren Jul 15, 2021
dc14826
Remove unnecessary environment statements
tuliren Jul 15, 2021
9be88a3
Merge branch 'master' into liren/db-config-persistence-v2
tuliren Jul 15, 2021
991a374
Hide config persistence factory (#4772)
cgardens Jul 15, 2021
7b845ce
Remove CONFIG_DATABASE_HOST
tuliren Jul 15, 2021
c2f9275
Use builder in the test
tuliren Jul 15, 2021
865e11b
Simplify config persistence builder
tuliren Jul 15, 2021
d88e663
Clarify config db connection readiness
tuliren Jul 15, 2021
ee9b3ea
Format code
tuliren Jul 15, 2021
1d7741e
Add logging
tuliren Jul 16, 2021
c29bb7e
Fix typo
tuliren Jul 17, 2021
ad9c982
Merge branch 'master' into liren/db-config-persistence-v2
tuliren Jul 17, 2021
0c68bc0
Add a config_id only index
tuliren Jul 17, 2021
86ee51e
Reuse record insertion code
tuliren Jul 18, 2021
d6b95b2
Add id field name to config schema
tuliren Jul 18, 2021
03ac64a
Support data loading from legacy config schemas
tuliren Jul 18, 2021
7537864
Log missing logs in migration test
tuliren Jul 18, 2021
024253b
Move airbyte configs table to separate directory
tuliren Jul 18, 2021
ae638ef
Update exception message
tuliren Jul 18, 2021
0d22400
Dump specific tables from the job database
tuliren Jul 18, 2021
c0ce70d
Remove postgres specific uuid extension
tuliren Jul 19, 2021
d76f58f
Comment out future branch
tuliren Jul 19, 2021
3d99a73
Default configs db variables to empty
tuliren Jul 19, 2021
d99cda7
Merge master into liren/db-config-persistence-v2
tuliren Jul 19, 2021
449cb30
Log inserted config records
tuliren Jul 19, 2021
d37b9fc
Log all db write operations
tuliren Jul 19, 2021
3461382
Add back config db variables in env file to mute warnings
tuliren Jul 19, 2021
fb0638e
Log connection exception to debug flaky e2e test
tuliren Jul 19, 2021
10cabfa
Leave config db variables empty
tuliren Jul 19, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion .env
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
VERSION=0.27.2-alpha

# Airbyte Internal Database, see https://docs.airbyte.io/operator-guides/configuring-airbyte-db
# Airbyte Internal Job Database, see https://docs.airbyte.io/operator-guides/configuring-airbyte-db
DATABASE_USER=docker
DATABASE_PASSWORD=docker
DATABASE_HOST=db
Expand All @@ -9,6 +9,15 @@ DATABASE_DB=airbyte
# translate manually DATABASE_URL=jdbc:postgresql://${DATABASE_HOST}:${DATABASE_PORT/${DATABASE_DB}
DATABASE_URL=jdbc:postgresql://db:5432/airbyte

# By default, the Config Database is the same as the Job Database.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the increased optionality around a custom config database here. Is there a reason we want to encourage separating this?

For users that were previously on the job database for configs, we don't want to handle migrating configs to the config database or having multiple versions of configs sitting around if we can help it.

It isn't clear to me what the benefit is, and it seems to just increase operational complexity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do want to have the option to separate the two databases. But I don't think we want to encourage it, at least for self-deployed instances.

The benefits of separating the two databases are:

  1. The jobs database and the configs database have different usage patterns. The former can grow much quicker than the latter (because we track job attempts there).
  2. The configs database contain secrets. So people may want extra security procedure around it.

What about if I remove the comments in the .env file, and only mention it in the documentation? In this way, we are not encouraging people to do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing them from the .env will result in warning messages about missing environment variables. So I added them back with comments:

# Airbyte Internal Config Database, default to reuse the Job Database
# Usually you do not need to change them
CONFIG_DATABASE_USER=${DATABASE_USER}
CONFIG_DATABASE_PASSWORD=${DATABASE_PASSWORD}
CONFIG_DATABASE_URL=${DATABASE_URL}

# Uncomment the following variable if a different Config Database is needed.
# CONFIG_DATABASE_USER=docker
# CONFIG_DATABASE_PASSWORD=docker
# CONFIG_DATABASE_HOST=db
# CONFIG_DATABASE_PORT=5432
# CONFIG_DATABASE_DB=airbyte
# CONFIG_DATABASE_URL=jdbc:postgresql://db:5432/airbyte

# When using the airbyte-db via default docker image:
CONFIG_ROOT=/data
DATA_DOCKER_MOUNT=airbyte_data
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
- workspaceId: 5ae6b09b-fdec-41af-aaf7-7d94cfc33ef6
name: default
slug: default
initialSetupComplete: false
displaySetupWizard: true
tombstone: false
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,12 @@ public interface Configs {

String getDatabaseUrl();

String getConfigDatabaseUser();

String getConfigDatabasePassword();

String getConfigDatabaseUrl();

String getWebappUrl();

String getWorkspaceDockerMount();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,9 @@ public class EnvConfigs implements Configs {
public static final String DATABASE_USER = "DATABASE_USER";
public static final String DATABASE_PASSWORD = "DATABASE_PASSWORD";
public static final String DATABASE_URL = "DATABASE_URL";
public static final String CONFIG_DATABASE_USER = "CONFIG_DATABASE_USER";
public static final String CONFIG_DATABASE_PASSWORD = "CONFIG_DATABASE_PASSWORD";
public static final String CONFIG_DATABASE_URL = "CONFIG_DATABASE_URL";
public static final String WEBAPP_URL = "WEBAPP_URL";
private static final String MINIMUM_WORKSPACE_RETENTION_DAYS = "MINIMUM_WORKSPACE_RETENTION_DAYS";
private static final String MAXIMUM_WORKSPACE_RETENTION_DAYS = "MAXIMUM_WORKSPACE_RETENTION_DAYS";
Expand Down Expand Up @@ -127,6 +130,24 @@ public String getDatabaseUrl() {
return getEnsureEnv(DATABASE_URL);
}

@Override
public String getConfigDatabaseUser() {
// Default to reuse the job database
return getEnvOrDefault(CONFIG_DATABASE_USER, getDatabaseUser());
}

@Override
public String getConfigDatabasePassword() {
// Default to reuse the job database
return getEnvOrDefault(CONFIG_DATABASE_PASSWORD, getDatabasePassword());
}

@Override
public String getConfigDatabaseUrl() {
// Default to reuse the job database
return getEnvOrDefault(CONFIG_DATABASE_URL, getDatabaseUrl());
}

@Override
public String getWebappUrl() {
return getEnsureEnv(WEBAPP_URL);
Expand Down Expand Up @@ -255,10 +276,10 @@ private long getEnvOrDefault(String key, long defaultValue) {

private <T> T getEnvOrDefault(String key, T defaultValue, Function<String, T> parser) {
final String value = getEnv.apply(key);
if (value != null) {
if (value != null && !value.isEmpty()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! This has bitten us a few times...

return parser.apply(value);
} else {
LOGGER.info(key + " not found, defaulting to " + defaultValue);
LOGGER.info(key + " not found or empty, defaulting to " + defaultValue);
return defaultValue;
}
}
Expand Down
6 changes: 5 additions & 1 deletion airbyte-config/persistence/build.gradle
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
dependencies {
implementation group: 'commons-io', name: 'commons-io', version: '2.7'

implementation project(':airbyte-db')
implementation project(':airbyte-config:models')
implementation project(":airbyte-json-validation")
implementation project(':airbyte-config:init')
implementation project(':airbyte-json-validation')

testImplementation "org.testcontainers:postgresql:1.15.1"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
/*
* MIT License
*
* Copyright (c) 2020 Airbyte
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in all
* copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/

package io.airbyte.config.persistence;

import static org.jooq.impl.DSL.field;
import static org.jooq.impl.DSL.table;

import java.sql.Timestamp;
import org.jooq.Field;
import org.jooq.JSONB;
import org.jooq.Record;
import org.jooq.Table;

public class AirbyteConfigsTable {

public static final String AIRBYTE_CONFIGS_TABLE_SCHEMA = "airbyte_configs_table.sql";

public static final Table<Record> AIRBYTE_CONFIGS = table("airbyte_configs");
public static final Field<String> CONFIG_ID = field("config_id", String.class);
public static final Field<String> CONFIG_TYPE = field("config_type", String.class);
public static final Field<JSONB> CONFIG_BLOB = field("config_blob", JSONB.class);
public static final Field<Timestamp> CREATED_AT = field("created_at", Timestamp.class);
public static final Field<Timestamp> UPDATED_AT = field("updated_at", Timestamp.class);

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
/*
* MIT License
*
* Copyright (c) 2020 Airbyte
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in all
* copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*/

package io.airbyte.config.persistence;

import static io.airbyte.config.persistence.AirbyteConfigsTable.AIRBYTE_CONFIGS_TABLE_SCHEMA;

import io.airbyte.commons.resources.MoreResources;
import io.airbyte.config.Configs;
import io.airbyte.db.Database;
import io.airbyte.db.Databases;
import java.io.IOException;
import java.nio.file.Path;
import java.util.function.Function;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/**
* By default, this factory returns a database config persistence. it can still return a file system
* config persistence for testing purpose. This legacy feature should be removed after the file to
* database migration is completely done.
Copy link
Contributor

@ChristopheDuong ChristopheDuong Jul 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking it would be really nice to keep a working file-based config persistence and not necessarily remove it permanently...

If this file config persistence reads/writes from a docker volume that is mounted from a local folder of the filesystem (like what the LOCAL_ROOT does) rather than a named docker volume, then a user could configure a CI running as part of their git repository versioning their airbyte setup. This setup could include (tests) configurations and once the CI is successful, they could publish in the same workflow their final changes to a remote Airbyte Database config persistence that would affect their airbyte instance in production for example.

WDYT?

Copy link
Contributor

@jrhizor jrhizor Jul 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, what benefits do you see from working with a file-based config persistence vs just using import/export capabilities?

Copy link
Contributor Author

@tuliren tuliren Jul 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess what Chris means is that if we keep the file config persistence code, users can still manipulate the configs using the code?

Copy link
Contributor

@ChristopheDuong ChristopheDuong Jul 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For import/export capabilities, we'd have what is currently exposed as:

  • getDbPersistenceWithYamlSeed: (standard files released with airbyte)
  • getDbPersistenceWithFileSeed: (old named docker volumes)

In the future, I am guessing we could have something like these?

  • getDbPersistenceWithYamlSeedFromGitRepo: clones a remote git repo where seeds are stored to re-init the DB from those
  • getDbPersistenceWithYamlSeedFromArchive: same but from a local/remote .tar.gz archive file
  • getDbPersistenceWithYamlSeedFromLocalFolder: same but from a local folder (no need to compress into an archive or push to a repo?)

Just curious, what benefits do you see from working with a file-based config persistence vs just using import/export capabilities?
if we keep the file config persistence code, users can still manipulate the configs using the code?

Yes.

For dev/test purposes, you can more easily edit/test your airbyte configurations locally by manually/programmatically editing the configs in parallel without having to restart airbyte all the time to re-init the database (re-deploy your changes)?

Plus the code is almost already written for that, so it could be worth keeping around
Of course, for production usage, it's better to go through the database

*/
public class ConfigPersistenceFactory {

private static final Logger LOGGER = LoggerFactory.getLogger(ConfigPersistenceFactory.class);

private final Configs configs;
private final boolean setupDatabase;
private final boolean useConfigDatabase;

/**
* @param setupDatabase initialize the database and load data; this is necessary because this method
* has multiple callers, and we want to setup the database only once to prevent race
* conditions.
*/
private ConfigPersistenceFactory(Configs configs, boolean setupDatabase, boolean useConfigDatabase) {
this.configs = configs;
this.setupDatabase = setupDatabase;
this.useConfigDatabase = useConfigDatabase;
}

public static Builder build(Configs configs) {
return new Builder(configs);
}

public static class Builder {

private final Configs configs;
private boolean setupDatabase = true;
private boolean useConfigDatabase = true;

private Builder(Configs configs) {
this.configs = configs;
}

public Builder setupDatabase(boolean setupDatabase) {
this.setupDatabase = setupDatabase;
return this;
}

public Builder useConfigDatabase(boolean useConfigDatabase) {
this.useConfigDatabase = useConfigDatabase;
return this;
}

public ConfigPersistenceFactory get() {
return new ConfigPersistenceFactory(configs, setupDatabase, useConfigDatabase);
}

}

/**
* Create a config persistence based on the configs.
* <p/>
* If config root is defined, create a database config persistence and copy the configs from the
* file-based config persistence. Otherwise, seed the database from the yaml files.
*/
public ConfigPersistence create() throws IOException {
if (!useConfigDatabase) {
Path configRoot = configs.getConfigRoot();
LOGGER.info("Use file system config persistence (root: {})", configRoot);
return FileSystemConfigPersistence.createWithValidation(configRoot);
}

if (configs.getConfigRoot() == null) {
// This branch will only be true in a future Airbyte version, in which
// the config root is no longer required and everything lives in the database.
return createDbPersistenceWithYamlSeed();
}

return createDbPersistenceWithFileSeed();
}

ConfigPersistence createDbPersistenceWithYamlSeed() throws IOException {
ConfigPersistence seedConfigPersistence = new YamlSeedConfigPersistence();
return createDbPersistence(seedConfigPersistence);
}

ConfigPersistence createDbPersistenceWithFileSeed() throws IOException {
Path configRoot = configs.getConfigRoot();
ConfigPersistence fsConfigPersistence = FileSystemConfigPersistence.createWithValidation(configRoot);
return createDbPersistence(fsConfigPersistence);
}

ConfigPersistence createDbPersistence(ConfigPersistence seedConfigPersistence) throws IOException {
LOGGER.info("Use database config persistence.");

// When setupDatabase is true, it means the database will be initialized after we
// connect to the database. So the database itself is considered ready as long as
// the connection is alive. Otherwise, the database is expected to have full data.
Function<Database, Boolean> isReady = setupDatabase
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it took me kinda a long time to reason about this. I wonder if it would be easier to understand if we just have two methods in databases createPostgresDatabaseWithRetry and createPostgresDatabaseWithRetryBlockUntilDatabaseIsSeeded. The latter is obviously a trash name but illustrates what I am trying to clarify.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that the queries we need to run to check if job database and config database are ready are different. So creating two methods in Databases won't solve this problem. I can try to simplify the method inside the ConfigPersistenceBuilder.

? Databases.IS_CONFIG_DATABASE_CONNECTED
: Databases.IS_CONFIG_DATABASE_LOADED_WITH_DATA;

Database database = Databases.createPostgresDatabaseWithRetry(
configs.getConfigDatabaseUser(),
configs.getConfigDatabasePassword(),
configs.getConfigDatabaseUrl(),
isReady);

DatabaseConfigPersistence dbConfigPersistence = new DatabaseConfigPersistence(database);
if (setupDatabase) {
dbConfigPersistence.initialize(MoreResources.readResource(AIRBYTE_CONFIGS_TABLE_SCHEMA));
dbConfigPersistence.loadData(seedConfigPersistence);
}

return new ValidatingConfigPersistence(dbConfigPersistence);
}

}
Loading