Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When using AWS Secrets Manager for persisting secrets, updating a connector deletes its secret #46097

Closed
cjwooo opened this issue Sep 27, 2024 · 24 comments
Assignees
Labels

Comments

@cjwooo
Copy link
Contributor

cjwooo commented Sep 27, 2024

Helm Chart Version

0.64.388

What step the error happened?

Other

Relevant information

Observed behavior:

  1. Create an Airbyte Source that has at least one secret param. Check Database for id of airbyte secret and confirm that it exists in AWS Secrets Manager.
  2. Update the Airbyte Source config. Regardless of whether a new value was provided for the secret param, the secret in Secrets Manager gets deleted.
  3. Any subsequent Source config update fails due to the secret being deleted. If a new value was provided in step 2, all subsequent syncs with that Source fail. If a new value was not provided, syncs still pass -- is Airbyte caching the secret value?

I believe I have tracked the cause to this commit. The related deletion behavior was behind a feature flag that was defaulted to false, but the commit removed the flag and enabled the deletion behavior permanently.

AFAICT, the problem with the Secrets Manager integration is that the version suffix that Airbyte adds to the ids -- e.g. <secret-id-base>_v1/v2/etc is not respected when writing to AWS Secrets Manager, so none of the secrets are versioned properly. As a result, when Airbyte thinks it's deleting an old secret after a connector config update, it's actually deleting the required secret, as there is only one.

While we are using an older chart version, and I am not familiar enough with the platform codebase to say anything about it concretely, I don't think this behavior has changed between our version and 1.0.

Relevant log output

No response

@tovbinm
Copy link

tovbinm commented Oct 1, 2024

@natikgadzhi @nataliekwong can you please prioritize the fix?

@natikgadzhi
Copy link
Contributor

That's @bgroff area.

@tovbinm
Copy link

tovbinm commented Oct 7, 2024

@bgroff, are there any updates on this? Is there someone else who can handle the fix for the issue?

@malikdiarra
Copy link
Contributor

@cjwooo Thanks for reporting this. Are you performing the creation and update of the source through the public API?

@malikdiarra malikdiarra self-assigned this Oct 14, 2024
@tovbinm
Copy link

tovbinm commented Oct 14, 2024

We are using update source API from OpenAPI spec: POST /v1/sources/update (body SourceUpdate)

@GregMisfitsMarket
Copy link

I'm also experiencing issues with AWS Secrets Manager and _v2 related errors, e.g.

2024-10-17 14:04:23 INFO i.a.w.l.c.WorkloadApiClient(updateStatusToLaunched):60 - Attempting to update workload: b6fec24d-5870-410a-afa8-902ee252bf20_776_4_check to LAUNCHED.
2024-10-17 14:04:23 INFO i.a.w.l.p.h.SuccessHandler(accept):60 - Pipeline completed for workload: b6fec24d-5870-410a-afa8-902ee252bf20_776_4_check.
2024-10-17 14:04:24 INFO i.a.w.l.c.WorkloadApiClient(claim):75 - Claimed: true for 2c4f2192-796d-4c56-b0a7-daf93c96c507_776_4_check via API for local
2024-10-17 14:04:24 INFO i.a.w.l.p.s.m.Stage(apply):39 - APPLY Stage: CHECK_STATUS — (workloadId = 2c4f2192-796d-4c56-b0a7-daf93c96c507_776_4_check) — (dataplaneId = local)
2024-10-17 14:04:24 INFO i.a.w.l.p.s.CheckStatusStage(applyStage):59 - No pod found running for workload 2c4f2192-796d-4c56-b0a7-daf93c96c507_776_4_check
2024-10-17 14:04:24 INFO i.a.w.l.p.s.m.Stage(apply):39 - APPLY Stage: BUILD — (workloadId = 2c4f2192-796d-4c56-b0a7-daf93c96c507_776_4_check) — (dataplaneId = local)
2024-10-17 14:04:24 WARN i.a.c.s.p.AwsSecretManagerPersistence(read):48 - Secret airbyte_workspace_f15641e6-2335-4684-932d-fcedd458c084_secret_d119d09b-b779-4fa6-8aff-c22d8494b7c7 not found
2024-10-17 14:04:24 ERROR i.a.w.l.p.h.FailureHandler(apply):39 - Pipeline Error
io.airbyte.workload.launcher.pipeline.stages.model.StageError: java.lang.RuntimeException: That secret was not found in the store! Coordinate: airbyte_workspace_f15641e6-2335-4684-932d-fcedd458c084_secret_d119d09b-b779-4fa6-8aff-c22d8494b7c7_v2
	at io.airbyte.workload.launcher.pipeline.stages.model.Stage.apply(Stage.kt:46) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
	at io.airbyte.workload.launcher.pipeline.stages.BuildInputStage.apply(BuildInputStage.kt:58) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
	at io.airbyte.workload.launcher.pipeline.stages.$BuildInputStage$Definition$Intercepted.$$access$$apply(Unknown Source) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
	at io.airbyte.workload.launcher.pipeline.stages.$BuildInputStage$Definition$Exec.dispatch(Unknown Source) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
	at io.micronaut.context.AbstractExecutableMethodsDefinition$DispatchedExecutableMethod.invoke(AbstractExecutableMethodsDefinition.java:456) ~[micronaut-inject-4.6.5.jar:4.6.5]
	at io.micronaut.aop.chain.MethodInterceptorChain.proceed(MethodInterceptorChain.java:134) ~[micronaut-aop-4.6.5.jar:4.6.5]
	at io.airbyte.metrics.interceptors.InstrumentInterceptorBase.doIntercept(InstrumentInterceptorBase.kt:61) ~[io.airbyte.airbyte-metrics-metrics-lib-1.1.0.jar:?]
	at io.airbyte.metrics.interceptors.InstrumentInterceptorBase.intercept(InstrumentInterceptorBase.kt:44) ~[io.airbyte.airbyte-metrics-metrics-lib-1.1.0.jar:?]
	at io.micronaut.aop.chain.MethodInterceptorChain.proceed(MethodInterceptorChain.java:143) ~[micronaut-aop-4.6.5.jar:4.6.5]
	at io.airbyte.workload.launcher.pipeline.stages.$BuildInputStage$Definition$Intercepted.apply(Unknown Source) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
	at io.airbyte.workload.launcher.pipeline.stages.BuildInputStage.apply(BuildInputStage.kt:39) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:132) ~[reactor-core-3.6.9.jar:3.6.9]
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onNext(MonoFlatMap.java:158) ~[reactor-core-3.6.9.jar:3.6.9]
	at reactor.core.publisher.Operators$ScalarSubscription.request(Operators.java:2571) ~[reactor-core-3.6.9.jar:3.6.9]
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194) ~[reactor-core-3.6.9.jar:3.6.9]
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194) ~[reactor-core-3.6.9.jar:3.6.9]
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194) ~[reactor-core-3.6.9.jar:3.6.9]
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.request(MonoFlatMap.java:194) ~[reactor-core-3.6.9.jar:3.6.9]
	at reactor.core.publisher.Operators$MultiSubscriptionSubscriber.set(Operators.java:2367) ~[reactor-core-3.6.9.jar:3.6.9]
	at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onSubscribe(FluxOnErrorResume.java:74) ~[reactor-core-3.6.9.jar:3.6.9]
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117) ~[reactor-core-3.6.9.jar:3.6.9]
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117) ~[reactor-core-3.6.9.jar:3.6.9]
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117) ~[reactor-core-3.6.9.jar:3.6.9]
	at reactor.core.publisher.MonoFlatMap$FlatMapMain.onSubscribe(MonoFlatMap.java:117) ~[reactor-core-3.6.9.jar:3.6.9]
	at reactor.core.publisher.FluxFlatMap.trySubscribeScalarMap(FluxFlatMap.java:193) ~[reactor-core-3.6.9.jar:3.6.9]
	at reactor.core.publisher.MonoFlatMap.subscribeOrReturn(MonoFlatMap.java:53) ~[reactor-core-3.6.9.jar:3.6.9]
	at reactor.core.publisher.Mono.subscribe(Mono.java:4560) ~[reactor-core-3.6.9.jar:3.6.9]
	at reactor.core.publisher.MonoSubscribeOn$SubscribeOnSubscriber.run(MonoSubscribeOn.java:126) ~[reactor-core-3.6.9.jar:3.6.9]
	at reactor.core.scheduler.ImmediateScheduler$ImmediateSchedulerWorker.schedule(ImmediateScheduler.java:84) ~[reactor-core-3.6.9.jar:3.6.9]
	at reactor.core.publisher.MonoSubscribeOn.subscribeOrReturn(MonoSubscribeOn.java:55) ~[reactor-core-3.6.9.jar:3.6.9]
	at reactor.core.publisher.Mono.subscribe(Mono.java:4560) ~[reactor-core-3.6.9.jar:3.6.9]
	at reactor.core.publisher.Mono.subscribeWith(Mono.java:4642) ~[reactor-core-3.6.9.jar:3.6.9]
	at reactor.core.publisher.Mono.subscribe(Mono.java:4403) ~[reactor-core-3.6.9.jar:3.6.9]
	at io.airbyte.workload.launcher.pipeline.LaunchPipeline.accept(LaunchPipeline.kt:50) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
	at io.airbyte.workload.launcher.pipeline.consumer.LauncherMessageConsumer.consume(LauncherMessageConsumer.kt:28) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
	at io.airbyte.workload.launcher.pipeline.consumer.LauncherMessageConsumer.consume(LauncherMessageConsumer.kt:12) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
	at io.airbyte.commons.temporal.queue.QueueActivityImpl.consume(Internal.kt:87) ~[io.airbyte-airbyte-commons-temporal-core-1.1.0.jar:?]
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) ~[?:?]
	at java.base/java.lang.reflect.Method.invoke(Method.java:580) ~[?:?]
	at io.temporal.internal.activity.RootActivityInboundCallsInterceptor$POJOActivityInboundCallsInterceptor.executeActivity(RootActivityInboundCallsInterceptor.java:64) ~[temporal-sdk-1.22.3.jar:?]
	at io.temporal.internal.activity.RootActivityInboundCallsInterceptor.execute(RootActivityInboundCallsInterceptor.java:43) ~[temporal-sdk-1.22.3.jar:?]
	at io.temporal.common.interceptors.ActivityInboundCallsInterceptorBase.execute(ActivityInboundCallsInterceptorBase.java:39) ~[temporal-sdk-1.22.3.jar:?]
	at io.temporal.opentracing.internal.OpenTracingActivityInboundCallsInterceptor.execute(OpenTracingActivityInboundCallsInterceptor.java:78) ~[temporal-opentracing-1.22.3.jar:?]
	at io.temporal.internal.activity.ActivityTaskExecutors$BaseActivityTaskExecutor.execute(ActivityTaskExecutors.java:107) ~[temporal-sdk-1.22.3.jar:?]
	at io.temporal.internal.activity.ActivityTaskHandlerImpl.handle(ActivityTaskHandlerImpl.java:124) ~[temporal-sdk-1.22.3.jar:?]
	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handleActivity(ActivityWorker.java:278) ~[temporal-sdk-1.22.3.jar:?]
	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:243) ~[temporal-sdk-1.22.3.jar:?]
	at io.temporal.internal.worker.ActivityWorker$TaskHandlerImpl.handle(ActivityWorker.java:216) ~[temporal-sdk-1.22.3.jar:?]
	at io.temporal.internal.worker.PollTaskExecutor.lambda$process$0(PollTaskExecutor.java:105) ~[temporal-sdk-1.22.3.jar:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
	at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
Caused by: java.lang.RuntimeException: That secret was not found in the store! Coordinate: airbyte_workspace_f15641e6-2335-4684-932d-fcedd458c084_secret_d119d09b-b779-4fa6-8aff-c22d8494b7c7_v2
	at io.airbyte.config.secrets.SecretsHelpers.getOrThrowSecretValue(SecretsHelpers.kt:288) ~[io.airbyte.airbyte-config-config-secrets-1.1.0.jar:?]
	at io.airbyte.config.secrets.SecretsHelpers.combineConfig(SecretsHelpers.kt:173) ~[io.airbyte.airbyte-config-config-secrets-1.1.0.jar:?]
	at io.airbyte.config.secrets.SecretsHelpers$combineConfig$1.invoke(SecretsHelpers.kt:183) ~[io.airbyte.airbyte-config-config-secrets-1.1.0.jar:?]
	at io.airbyte.config.secrets.SecretsHelpers$combineConfig$1.invoke(SecretsHelpers.kt:177) ~[io.airbyte.airbyte-config-config-secrets-1.1.0.jar:?]
	at io.airbyte.config.secrets.SecretsHelpers.combineConfig$lambda$2(SecretsHelpers.kt:177) ~[io.airbyte.airbyte-config-config-secrets-1.1.0.jar:?]
	at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133) ~[?:?]
	at io.airbyte.config.secrets.SecretsHelpers.combineConfig(SecretsHelpers.kt:177) ~[io.airbyte.airbyte-config-config-secrets-1.1.0.jar:?]
	at io.airbyte.config.secrets.hydration.RealSecretsHydrator.hydrateFromDefaultSecretPersistence(RealSecretsHydrator.kt:21) ~[io.airbyte.airbyte-config-config-secrets-1.1.0.jar:?]
	at io.airbyte.config.secrets.SecretsRepositoryReader.hydrateConfigFromDefaultSecretPersistence(SecretsRepositoryReader.kt:60) ~[io.airbyte.airbyte-config-config-secrets-1.1.0.jar:?]
	at io.airbyte.workers.ConnectorSecretsHydrator.hydrateConfig(ConnectorSecretsHydrator.kt:33) ~[io.airbyte-airbyte-commons-worker-1.1.0.jar:?]
	at io.airbyte.workers.CheckConnectionInputHydrator.getHydratedStandardCheckInput(CheckConnectionInputHydrator.kt:13) ~[io.airbyte-airbyte-commons-worker-1.1.0.jar:?]
	at io.airbyte.workload.launcher.pipeline.stages.BuildInputStage.buildPayload(BuildInputStage.kt:90) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
	at io.airbyte.workload.launcher.pipeline.stages.BuildInputStage.applyStage(BuildInputStage.kt:62) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
	at io.airbyte.workload.launcher.pipeline.stages.BuildInputStage.applyStage(BuildInputStage.kt:39) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
	at io.airbyte.workload.launcher.pipeline.stages.model.Stage.apply(Stage.kt:42) ~[io.airbyte-airbyte-workload-launcher-1.1.0.jar:?]
	... 51 more
2024-10-17 14:04:24 INFO i.a.w.l.c.WorkloadApiClient(updateStatusToFailed):54 - Attempting to update workload: 2c4f2192-796d-4c56-b0a7-daf93c96c507_776_4_check to FAILED.
2024-10-17 14:04:24 INFO i.a.w.l.p.h.FailureHandler(apply):62 - Pipeline aborted after error for workload: 2c4f2192-796d-4c56-b0a7-daf93c96c507_776_4_check.
2024-10-17 14:04:25 platform > Failing job: 776, reason: Job failed after too many retries for connection 239938ef-49a4-4d48-ba6d-9d50553eb962

@GregMisfitsMarket
Copy link

FWIW - still seeing this behavior after upgrading to v1.3.0

@giacomochiarella
Copy link

giacomochiarella commented Jan 8, 2025

I'm having this issue too on Airbyte 1.3.0.
Assuming each ingestion is executed at least once a day, what if we remove secretsmanager:DeleteSecret permission from the IAM role set in the value.yml and a script deletes all the secrets which are
(not accessed by more than 2 days and not changed by more than 2 days)
or
the secret id starts with airbyte_workspace_00000000
it is easily extendable in cases where there are ingestions triggered with lower frequency, just increasing the number of days instead of 2 days.
Could this be a good workaround?

@marcosmarxm
Copy link
Member

Hello, team is investigating the issue and trying to reproduce it to implement a fix. Any update I'll let you know.

@SkinnyPigeon
Copy link
Contributor

Hey, we've got this exact issue as well. If anyone makes a change to a Source, it will break any connections. We are on 1.2.0. More users are starting to be onboarded so this is likely going to become critical

@bHacklv
Copy link

bHacklv commented Feb 4, 2025

@marcosmarxm any update on this?
We are using Airbyte 1.4.0 and we've observed the same behaviour where the sync jobs stop working when a change is done to a source.

Image

@JonsSpaghetti
Copy link
Contributor

Hey folks, we have a PR up internally with a fix for this. It should get merged this week and should go out in the 1.5.0 release which should happen next week. Stay tuned.

@JonsSpaghetti
Copy link
Contributor

JonsSpaghetti commented Feb 12, 2025

Fix has been merged, expect it to go out with 1.5.0 next week. Thanks for reporting this and for the patience here everyone.

Note that this will take effect for new connections, since the old secrets were saved incorrectly it's difficult for us to remediate them. We're trying to find a way to mitigate for existing connections but it isn't straightforward.

Edit: We've got a way to mitigate for existing connections.

Summary: For sources/destinations that haven't been touched, there should be no issues either continuing to run connections or updating secrets/config post upgrading to 1.5.0
For any failure states (secrets were deleted because of the bug), just update your source/destination's secrets after upgrading to 1.5.0 and secrets should persist correctly.

@giacomochiarella
Copy link

giacomochiarella commented Feb 24, 2025

@JonsSpaghetti I'm testing Airbyte 1.5.0. I've notice that if you change a secret (e.g. a destination password) Airbyte deletes the current one but for some reason it creates always a new secret with the new password and another secret with the old password.
Example:
first time: it creates secret0
change the password: it deletes secret0 and it creates secret1 with current password and secret2 with new password
change again the password: it deletes secret2 and creates secret3 with current password and secret4 with new password
I ended up in having 4 secrets with 2 password changes and if keep going it continues recreating 2 secrets per change
is that a bug?

@SkinnyPigeon
Copy link
Contributor

SkinnyPigeon commented Feb 24, 2025

It costs $0.40 per secret per month in AWS Secret Manager. It would be nice to keep these to a minimum if at all possible 🙏

@JonsSpaghetti
Copy link
Contributor

Unfortunately yes the extra secrets you see getting created are from our check_connection. We have updates to make to ensure that those are expired and cleaned up correctly which we will do in a follow-up.

@giacomochiarella
Copy link

@JonsSpaghetti meanwhile this is fixed, can we assume Airbyte will never use the secrets containing the old password and delete them by just filtering by last accessed date? Assuming that all the ingestions are scheduled to be triggered at least once per day, can we safely delete all the secrets with last accessed date < now - 2 or 3 days?

@JonsSpaghetti
Copy link
Contributor

JonsSpaghetti commented Feb 24, 2025

@giacomochiarella I'd recommend looking @ the naming scheme of the secret instead.

The secrets with a name like airbyte_workspace_00000000-0000-0000-0000-000000000000_secret_* should be ones which are ephemeral - once they've been accessed (usually only once) they should be safe to clean up.

Writing secrets with expiry is supported in our existing SecretPersistence interface, but it's only implemented for some of our supported secret managers. I'll add an item to our backlog to make sure we support it for AWS as well so that these secrets will be cleaned up automatically. I'll link the new ticket here.

@giacomochiarella
Copy link

giacomochiarella commented Feb 25, 2025

@JonsSpaghetti can you be more precise please? I ended up in this situation. All the secrets are there. Even if the one with 0s would be removed (and looks to me it is not removed) there are many others to delete.
This showed is the situation after I just setup salesforce source without never change the secret once. Just the very first creation.

Image

This would cause uncontrolled extra costs.

@isaac-perez-nexthink
Copy link

Thanks you all. Is fixed now on v1.5.0

@JonsSpaghetti
Copy link
Contributor

@giacomochiarella that looks correct to me. The salesforce source has multiple secrets in its config and each one creates a single secret. The temp secrets are the ones that should be cleaned up and I created a related ticket to track that. I'm going to close this ticket as last comment states this specific issue is resolved in 1.5.0.

@giacomochiarella
Copy link

giacomochiarella commented Feb 25, 2025

@JonsSpaghetti could you mention the ticket for the other issue for me to be posted on it?

@JonsSpaghetti
Copy link
Contributor

@omreego
Copy link

omreego commented Mar 7, 2025

Does this change support the previous AWS Secrets Manager secrets which don't have the _vX extension?
We updated to 1.5.0 and are noticing that updating old secrets is not possible.
Update: We have the condition that airbyte can only update or create secrets with tag "airbyte" set to "true".
When we remove the condition it doesn't fail to update

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests