Skip to content

Azure Blob Storage destination crashes due to lack of buffering #5980

Closed
@vholmer

Description

@vholmer

Enviroment

  • Airbyte version: 0.26.15-alpha
  • OS Version / Instance: Ubuntu 18.04
  • Deployment: Docker
  • Source Connector and version: Oracle DB 0.3.3
  • Destination Connector and version: Azure Blob Storage 0.1.0
  • Severity: Critical (for this destination connector)
  • Step where error happened: Sync job

Current Behavior

Sync crashes after reaching 50000 blocks in file due to the fact that BlobStorageClient doesn't utilize its own buffering. Each message is sent in its own block which is not optimal. We should implement some kind of buffering before we send a message to the blob.

Expected Behavior

Sync shouldn't crash, the file should keep growing without reaching a high committed block count.

Other details

The file was automatically created & filled by Airbyte and contained 50000 lines and 50000 committed blocks.

Logs

LOG
2021-09-10 08:08:43 INFO () LogClientSingleton(setJobMdc):146 - Setting docker job mdc
2021-09-10 08:08:53 INFO () LogClientSingleton(setJobMdc):146 - Setting docker job mdc
2021-09-10 08:09:03 INFO () LogClientSingleton(setJobMdc):146 - Setting docker job mdc
2021-09-10 08:09:11 ERROR () LineGobbler(voidCall):85 - Sep 10, 2021 8:09:11 AM oracle.simplefan.impl.FanManager configure
2021-09-10 08:09:11 ERROR () LineGobbler(voidCall):85 - SEVERE: attempt to configure ONS in FanManager failed with oracle.ons.NoServersAvailable: Subscription time out
2021-09-10 08:09:13 INFO () LogClientSingleton(setJobMdc):146 - Setting docker job mdc
2021-09-10 08:09:14 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-09-10 08:09:14 [32mINFO[m i.a.i.s.r.AbstractRelationalDbSource(queryTableFullRefresh):478 - {} - Queueing query for table: ANONYMIZED_NAME1
2021-09-10 08:09:14 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-09-10 08:09:14 [32mINFO[m i.a.i.s.r.AbstractRelationalDbSource(queryTableFullRefresh):478 - {} - Queueing query for table: ANONYMIZED_NAME2
2021-09-10 08:09:14 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-09-10 08:09:14 [32mINFO[m i.a.i.s.r.AbstractRelationalDbSource(queryTableFullRefresh):478 - {} - Queueing query for table: ANONYMIZED_NAME3
2021-09-10 08:09:16 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-09-10 08:09:16 [1;31mERROR[m c.a.c.u.l.ClientLogger(performLogging):350 - {} - com.azure.storage.blob.models.BlobStorageException: Status code 409, "<?xml version="1.0" encoding="utf-8"?><Error><Code>BlockCountExceedsLimit</Code><Message>The committed block count cannot exceed the maximum limit of 50,000 blocks.
2021-09-10 08:09:16 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - Time:2021-09-10T08:09:16.3414416Z</Message></Error>"
2021-09-10 08:09:16 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-09-10 08:09:16 [1;31mERROR[m i.a.i.d.a.AzureBlobStorageConsumer(acceptTracked):167 - {} - Failed to write messagefor stream io.airbyte.integrations.destination.azure_blob_storage.jsonl.AzureBlobStorageJsonlWriter@4f63e3c7, details: com.azure.storage.blob.models.BlobStorageException: Status code 409, "<?xml version="1.0" encoding="utf-8"?><Error><Code>BlockCountExceedsLimit</Code><Message>The committed block count cannot exceed the maximum limit of 50,000 blocks.
2021-09-10 08:09:16 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - Time:2021-09-10T08:09:16.3414416Z</Message></Error>"
2021-09-10 08:09:16 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-09-10 08:09:16 [33mWARN[m i.a.i.b.FailureTrackingAirbyteMessageConsumer(close):78 - {} - Airbyte message consumer: failed.
2021-09-10 08:09:16 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-09-10 08:09:16 [33mWARN[m i.a.i.d.a.w.BaseAzureBlobStorageWriter(close):66 - {} - Failure detected. Aborting upload of stream 'ANONYMIZED_NAME1'...
2021-09-10 08:09:16 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-09-10 08:09:16 [1;31mERROR[m c.a.c.u.l.ClientLogger(performLogging):350 - {} - com.azure.storage.blob.models.BlobStorageException: Status code 409, "<?xml version="1.0" encoding="utf-8"?><Error><Code>BlockCountExceedsLimit</Code><Message>The committed block count cannot exceed the maximum limit of 50,000 blocks.
2021-09-10 08:09:16 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - Time:2021-09-10T08:09:16.3414416Z</Message></Error>"
2021-09-10 08:09:16 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-09-10 08:09:16 [1;31mERROR[m c.a.c.u.l.ClientLogger(performLogging):350 - {} - com.azure.storage.blob.models.BlobStorageException: Status code 409, "<?xml version="1.0" encoding="utf-8"?><Error><Code>BlockCountExceedsLimit</Code><Message>The committed block count cannot exceed the maximum limit of 50,000 blocks.
2021-09-10 08:09:16 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - Time:2021-09-10T08:09:16.3414416Z</Message></Error>"
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: com.azure.storage.blob.models.BlobStorageException: Status code 409, "<?xml version="1.0" encoding="utf-8"?><Error><Code>BlockCountExceedsLimit</Code><Message>The committed block count cannot exceed the maximum limit of 50,000 blocks.
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - Time:2021-09-10T08:09:16.3414416Z</Message></Error>"
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - 	at io.airbyte.integrations.destination.azure_blob_storage.AzureBlobStorageConsumer.acceptTracked(AzureBlobStorageConsumer.java:169)
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - 	at io.airbyte.integrations.base.FailureTrackingAirbyteMessageConsumer.accept(FailureTrackingAirbyteMessageConsumer.java:66)
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - 	at io.airbyte.integrations.base.IntegrationRunner.consumeWriteStream(IntegrationRunner.java:167)
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - 	at io.airbyte.integrations.base.IntegrationRunner.run(IntegrationRunner.java:148)
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - 	at io.airbyte.integrations.destination.azure_blob_storage.AzureBlobStorageDestination.main(AzureBlobStorageDestination.java:47)
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - 	Suppressed: java.lang.RuntimeException: com.azure.storage.blob.models.BlobStorageException: Status code 409, "<?xml version="1.0" encoding="utf-8"?><Error><Code>BlockCountExceedsLimit</Code><Message>The committed block count cannot exceed the maximum limit of 50,000 blocks.
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - Time:2021-09-10T08:09:16.3414416Z</Message></Error>"
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - 		at com.azure.storage.common.StorageOutputStream.checkStreamState(StorageOutputStream.java:79)
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - 		at com.azure.storage.blob.specialized.BlobOutputStream.close(BlobOutputStream.java:119)
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - 		at java.base/sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:353)
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - 		at java.base/sun.nio.cs.StreamEncoder.close(StreamEncoder.java:168)
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - 		at java.base/java.io.OutputStreamWriter.close(OutputStreamWriter.java:255)
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - 		at java.base/java.io.BufferedWriter.close(BufferedWriter.java:269)
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - 		at java.base/java.io.PrintWriter.close(PrintWriter.java:415)
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - 		at io.airbyte.integrations.destination.azure_blob_storage.jsonl.AzureBlobStorageJsonlWriter.closeWhenFail(AzureBlobStorageJsonlWriter.java:86)
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - 		at io.airbyte.integrations.destination.azure_blob_storage.writer.BaseAzureBlobStorageWriter.close(BaseAzureBlobStorageWriter.java:67)
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - 		at io.airbyte.integrations.destination.azure_blob_storage.AzureBlobStorageConsumer.close(AzureBlobStorageConsumer.java:176)
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - 		at io.airbyte.integrations.base.FailureTrackingAirbyteMessageConsumer.close(FailureTrackingAirbyteMessageConsumer.java:82)
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - 		at io.airbyte.integrations.base.IntegrationRunner.consumeWriteStream(IntegrationRunner.java:161)
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - 		... 2 more
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - Caused by: java.lang.RuntimeException: com.azure.storage.blob.models.BlobStorageException: Status code 409, "<?xml version="1.0" encoding="utf-8"?><Error><Code>BlockCountExceedsLimit</Code><Message>The committed block count cannot exceed the maximum limit of 50,000 blocks.
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - Time:2021-09-10T08:09:16.3414416Z</Message></Error>"
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - 	at com.azure.storage.common.StorageOutputStream.checkStreamState(StorageOutputStream.java:79)
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - 	at com.azure.storage.common.StorageOutputStream.flush(StorageOutputStream.java:89)
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - 	at java.base/sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:327)
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - 	at java.base/sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:159)
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - 	at java.base/java.io.OutputStreamWriter.flush(OutputStreamWriter.java:251)
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - 	at java.base/java.io.BufferedWriter.flush(BufferedWriter.java:257)
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - 	at java.base/java.io.PrintWriter.newLine(PrintWriter.java:568)
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - 	at java.base/java.io.PrintWriter.println(PrintWriter.java:711)
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - 	at java.base/java.io.PrintWriter.println(PrintWriter.java:822)
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - 	at io.airbyte.integrations.destination.azure_blob_storage.jsonl.AzureBlobStorageJsonlWriter.write(AzureBlobStorageJsonlWriter.java:74)
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - 	at io.airbyte.integrations.destination.azure_blob_storage.AzureBlobStorageConsumer.acceptTracked(AzureBlobStorageConsumer.java:164)
2021-09-10 08:09:16 ERROR () LineGobbler(voidCall):85 - 	... 4 more

Steps to Reproduce

  1. Read some large data (>50Mb) into destination connector described above.
  2. 🔥🔥🔥

Are you willing to submit a PR?

No.

Metadata

Metadata

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions