Skip to content

[destination-S3] Clearing storage area increasingly slow on long-lived airbyte instances #21375

Open
@OrangeManLives

Description

@OrangeManLives

Environment

Is this your first time deploying Airbyte?: No
OS Version / Instance: Oracle Linux 8 Ec2 Instance
Memory / Disk: 32GB / 100GB
Deployment: Docker
Airbyte Version: 0.40.22
Source name/version: Postgres 1.0.33
Destination name/version: S3 0.3.17
Step: The issue is happening upon re-adding a connection and performing the reset of streams for a connection

Current Behavior

There are occasions where I will wish to delete and create a number of connections, this results in a new initial sync of data as desired, however, when attempting to setup a connection on airbyte EC2 instance that has been running for some time, it takes longer and longer for the ‘Clearing storage area in destination’ step to complete during the stream reset. I’ve seen this take over 4 hours for one database with around 666 tables, yet if I rebuild the airbyte environment by destroying the EC2 instance and configuring it via ansible, This step took less than a minute.

I’ve tried manually ensuring the destination bucket is empty so that this step does not have to clean up any data in the destination, however, this made no difference and I believe this step relates to the local data/preparation.

Here is an excerpt from the time it took over 4 hours, note how it attempts to clear a destination many times over and the time between attempts;

Expected Behavior

I would expect the run time of this step to be consistant regardless of airbyte instance running time.

Logs

2023-01-03 09:47:40 destination > class io.airbyte.integrations.destination.buffered_stream_consumer.BufferedStreamConsumer started. 2023-01-03 09:47:40 destination > Preparing bucket in destination started for 668 streams 2023-01-03 09:47:40 destination > Clearing storage area in destination started for namespace db4-chandos stream daysofweek bucketObject / pathFormat //${NAMESPACE}/${STREAM_NAME}/${YEAR}_${MONTH}_${DAY}_${EPOCH}_ 2023-01-03 09:47:42 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/daysofweek/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:47:44 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/daysofweek/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:47:45 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/daysofweek/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:47:47 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/daysofweek/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:47:48 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/daysofweek/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:47:50 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/daysofweek/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:47:52 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/daysofweek/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:47:53 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/daysofweek/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:47:55 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/daysofweek/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:47:56 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/daysofweek/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:47:58 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/daysofweek/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:47:59 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/daysofweek/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:48:01 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/daysofweek/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:48:02 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/daysofweek/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:48:04 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/daysofweek/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:48:06 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/daysofweek/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:48:07 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/daysofweek/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:48:09 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/daysofweek/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:48:09 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/daysofweek/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:48:09 destination > Clearing storage area in destination completed for namespace db4-chandos stream daysofweek bucketObject / 2023-01-03 09:48:09 destination > Clearing storage area in destination started for namespace db4-chandos stream jacs bucketObject / pathFormat //${NAMESPACE}/${STREAM_NAME}/${YEAR}_${MONTH}_${DAY}_${EPOCH}_ 2023-01-03 09:48:11 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/jacs/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:48:12 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/jacs/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:48:14 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/jacs/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:48:16 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/jacs/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:48:17 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/jacs/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:48:19 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/jacs/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:48:20 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/jacs/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:48:22 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/jacs/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:48:23 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/jacs/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:48:25 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/jacs/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:48:26 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/jacs/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:48:28 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/jacs/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:48:30 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/jacs/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:48:31 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/jacs/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:48:33 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/jacs/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:48:34 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/jacs/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:48:36 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/jacs/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:48:37 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/jacs/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)... 2023-01-03 09:48:38 destination > Storage bucket / has been cleaned-up (0 objects matching /db4-chandos/jacs/[0-9]{4}_[0-9]{2}_[0-9]{2}_[0-9]+_.* were deleted)...

Steps to Reproduce

  1. Run a number of connections for a week or so
  2. Delete and add a connection
  3. Perform the stream reset/refresh and note this difference in this step compared to the initial add

Are you willing to submit a PR?

I would love to contribute, but Java is not in my current skill set.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions