Skip to content

Support JsonSchema anyOf when writing Parquet/Avro in S3 destination #4294

Closed
@olivermeyer

Description

@olivermeyer

Expected Behavior

I have a connection between Salesforce and S3 (Parquet). The expected behaviour is that the sync should work, and data should be written to S3.

Current Behavior

The sync starts but quickly hangs with no further messages in the logs.

Logs

Since the Salesforce connector exposes credentials in plain text in the logs, I cannot post them in full. However, I found the following which seems relevant:

2021-06-23 14:25:34 INFO () DefaultAirbyteStreamFactory(lambda$create$0):73 - 2021-06-23 14:25:34 [33mWARN[m i.a.i.b.FailureTrackingAirbyteMessageConsumer(close):78 - {} - Airbyte message consumer: failed.
2021-06-23 14:25:34 ERROR () LineGobbler(voidCall):85 - Exception in thread "main" java.lang.IllegalStateException: Field CreatedDate has no type
2021-06-23 14:25:34 ERROR () LineGobbler(voidCall):85 - at io.airbyte.integrations.destination.s3.parquet.JsonToAvroSchemaConverter.getTypes(JsonToAvroSchemaConverter.java:74)
2021-06-23 14:25:34 ERROR () LineGobbler(voidCall):85 - at io.airbyte.integrations.destination.s3.parquet.JsonToAvroSchemaConverter.getNonNullTypes(JsonToAvroSchemaConverter.java:68)
2021-06-23 14:25:34 ERROR () LineGobbler(voidCall):85 - at io.airbyte.integrations.destination.s3.parquet.JsonToAvroSchemaConverter.getNullableFieldTypes(JsonToAvroSchemaConverter.java:189)
2021-06-23 14:25:34 ERROR () LineGobbler(voidCall):85 - at io.airbyte.integrations.destination.s3.parquet.JsonToAvroSchemaConverter.getAvroSchema(JsonToAvroSchemaConverter.java:137)
2021-06-23 14:25:34 ERROR () LineGobbler(voidCall):85 - at io.airbyte.integrations.destination.s3.writer.ProductionWriterFactory.create(ProductionWriterFactory.java:58)
2021-06-23 14:25:34 ERROR () LineGobbler(voidCall):85 - at io.airbyte.integrations.destination.s3.S3Consumer.startTracked(S3Consumer.java:103)
2021-06-23 14:25:34 ERROR () LineGobbler(voidCall):85 - at io.airbyte.integrations.base.FailureTrackingAirbyteMessageConsumer.start(FailureTrackingAirbyteMessageConsumer.java:54)
2021-06-23 14:25:34 ERROR () LineGobbler(voidCall):85 - at io.airbyte.integrations.base.IntegrationRunner.consumeWriteStream(IntegrationRunner.java:127)
2021-06-23 14:25:34 ERROR () LineGobbler(voidCall):85 - at io.airbyte.integrations.base.IntegrationRunner.run(IntegrationRunner.java:113)
2021-06-23 14:25:34 ERROR () LineGobbler(voidCall):85 - at io.airbyte.integrations.destination.s3.S3Destination.main(S3Destination.java:49)

Steps to Reproduce

  1. Set up a Salesforce source
  2. Set up an S3 destination, with Parquet format
  3. Set up a connection between the two, syncing the Account stream (might affect other streams as well); trigger the sync and wait

Severity of the bug for you

Critical - CSV is not acceptable as a file format for us, and not having this connection is an immediate showstopper.

Airbyte Version

0.26.2-alpha

Connector Version (if applicable)

Salesforce: 0.2.1
S3: 0.1.6

Additional context

I tried syncing another stream (UserPreference) and ran into the same issue. The logs were similar too:

2021-06-23 14:39:22 ERROR () LineGobbler(voidCall):85 - Exception in thread "main" java.lang.IllegalStateException: Field SystemModstamp has no type

There definitely seems to be a pattern, but I'm not familiar enough with Airbyte's internals to understand it.

I can also confirm that the following works:

  • Reading from another source and writing to S3 in Parquet
  • Reading from Salesforce and writing to S3 in CSV

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions