Open
Description
Environment
- Airbyte version: 0.40.0-alpha
- OS Version / Instance: Azure Kubernetes Service
- Deployment: Kubernetes Helm Chart
- Source Connector and version: PostgreSQL
- Destination Connector and version: Azure Blob Storage Alpha
- Step where error happened: Check Connection to Destination Azure Blob Storage
Current Behavior
Connection check time out because its looping through every single blob within the container.
public void attemptWriteAndDelete() {
initTestContainerAndBlob();
writeUsingAppendBlock("Some test data");
listBlobsInContainer()
.forEach(
blobItem -> LOGGER.info(
"Blob name: " + blobItem.getName() + "Snapshot: " + blobItem.getSnapshot()));
deleteBlob();
}
The listBlobsInContainer()
call will attempt to list all blobs with prefix /
which means all blobs in the whole container. If you have a large data lake that means millions of records that are looped through and logged out.
Expected Behavior
Limit number of files that the connection tries attempts to list. You can pass in options into the Azure SDK listBlobs()
call to limit things:
Logs
Log4j2Appender says: 2022-08-26 16:10:05 INFO i.a.i.d.a.AzureBlobStorageConnectionChecker(lambda$attemptWriteAndDelete$0):54 - Blob name: brad-test/parts/daily_store_inventory_part_pln_dated.parquet/part-14048-6ac91a9f-5924-4385-971e-43348051854c-c000.snappy.parquetSnapshot: null
2022-08-26 16:10:05 INFO i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-08-26 16:10:05 INFO i.a.i.d.a.AzureBlobStorageConnectionChecker(lambda$attemptWriteAndDelete$0):54 - Blob name: brad-test/parts/daily_store_inventory_part_pln_dated.parquet/part-14049-6ac91a9f-5924-4385-971e-43348051854c-c000.snappy.parquetSnapshot: null
Log4j2Appender says: 2022-08-26 16:10:05 INFO i.a.i.d.a.AzureBlobStorageConnectionChecker(lambda$attemptWriteAndDelete$0):54 - Blob name: brad-test/parts/daily_store_inventory_part_pln_dated.parquet/part-14049-6ac91a9f-5924-4385-971e-43348051854c-c000.snappy.parquetSnapshot: null
2022-08-26 16:10:05 INFO i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-08-26 16:10:05 INFO i.a.i.d.a.AzureBlobStorageConnectionChecker(lambda$attemptWriteAndDelete$0):54 - Blob name: brad-test/parts/daily_store_inventory_part_pln_dated.parquet/part-14050-6ac91a9f-5924-4385-971e-43348051854c-c000.snappy.parquetSnapshot: null
Log4j2Appender says: 2022-08-26 16:10:05 INFO i.a.i.d.a.AzureBlobStorageConnectionChecker(lambda$attemptWriteAndDelete$0):54 - Blob name: brad-test/parts/daily_store_inventory_part_pln_dated.parquet/part-14050-6ac91a9f-5924-4385-971e-43348051854c-c000.snappy.parquetSnapshot: null
2022-08-26 16:10:05 INFO i.a.w.i.DefaultAirbyteStreamFactory(lambda$create$0):61 - 2022-08-26 16:10:05 INFO i.a.i.d.a.AzureBlobStorageConnectionChecker(lambda$attemptWriteAndDelete$0):54 - Blob name: brad-test/parts/daily_store_inventory_part_pln_dated.parquet/part-14051-6ac91a9f-5924-4385-971e-43348051854c-c000.snappy.parquetSnapshot: null
Log4j2Appender says: 2022-08-26 16:10:05 INFO i.a.i.d.a.AzureBlobStorageConnectionChecker(lambda$attemptWriteAndDelete$0):54 - Blob name: brad-test/parts/daily_store_inventory_part_pln_dated.parquet/part-14051-6ac91a9f-5924-4385-971e-43348051854c-c000.snappy.parquetSnapshot: null
Steps to Reproduce
- Setup a Azure Blob Container and fill with millions of files.
- Setup destination Azure Blob Storage and test the connection.
- The call to
check_connection/
API endpoint will be pending forever. You can view log output on worker to see that the worker is looping through every single file.