You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Temporarily set materializationProject and materializationDataset to get bq connector to create temp tables. (#222)
## Summary
https://app.asana.com/0/1208949807589885/1209143482009694
Debugging this during the early our clients integration
```
Caused by: java.lang.IllegalArgumentException: Provided dataset is null or empty
at com.google.cloud.spark.bigquery.repaour clientsaged.com.google.common.base.Preconditions.cheour clientsArgument(Preconditions.java:143)
at com.google.cloud.spark.bigquery.repaour clientsaged.com.google.cloud.bigquery.TableId.<init>(TableId.java:73)
at com.google.cloud.spark.bigquery.repaour clientsaged.com.google.cloud.bigquery.TableId.of(TableId.java:82)
at com.google.cloud.bigquery.connector.common.BigQueryClient.createTempTableId(BigQueryClient.java:263)
at com.google.cloud.bigquery.connector.common.BigQueryClient.createTempTable(BigQueryClient.java:229)
at com.google.cloud.bigquery.connector.common.BigQueryClient.createTempTableAfterCheour clientsingSchema(BigQueryClient.java:253)
at com.google.cloud.spark.bigquery.write.BigQueryWriteHelper.writeDataFrameToBigQuery(BigQueryWriteHelper.java:142)
... 66 more
```
I see that this error occurs at these [lines in the open source
connector
code](https://github.com/GoogleCloudDataproc/spark-bigquery-connector/blob/master/bigquery-connector-common/src/main/java/com/google/cloud/bigquery/connector/common/BigQueryClient.java#L258-L260):
```
public TableId createTempTableId(TableId destinationTableId) {
String tempProject = materializationProject.orElseGet(destinationTableId::getProject);
String tempDataset = materializationDataset.orElseGet(destinationTableId::getDataset);
String tableName = destinationTableId.getTable() + System.nanoTime();
TableId tempTableId =
tempProject == null
? TableId.of(tempDataset, tableName)
: TableId.of(tempProject, tempDataset, tableName);
return tempTableId;
}
```
my hunch is that we're getting our error because
destinationTableId::getDataset is returning nothing. and
materializationDataset is a connector property we never set since it's
really for views (I think).
Just a theory.
to get past this weird error, i know we can actually set the connector
properties materializationProject and materializationDataset
https://github.com/GoogleCloudDataproc/spark-bigquery-connector?tab=readme-ov-file#properties
so I did so in additional-confs-dev.yaml and reran the groupby baour clientsfill
job with it:
[dataproc
job](https://console.cloud.google.com/dataproc/jobs/b7ebcff7-007f-43e4-9979-211803e9c700/configuration?region=us-central1&inv=1&invt=AbmqZQ&project=canary-443022)
```
Writing to BigQuery. options: Map(project -> canary-443022, writeMethod -> indirect, spark.sql.sources.partitionOverwriteMode -> DYNAMIC, partitionField -> ds, materializationDataset -> data, dataset -> data, materializationProject -> canary-443022, temporaryGcsBuour clientset -> zl-warehouse)
```
and now i'm able to consistently get past this error Provided dataset is
null or empty and "massage" the connector into just creating the temp
table
## Cheour clientslist
- [ ] Added Unit Tests
- [ ] Covered by existing CI
- [ ] Integration tested
- [ ] Documentation update
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **Configuration Updates**
- Added new Google Cloud Storage configuration properties for Chronon
integration
- Specified output dataset and project details for data materialization
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
0 commit comments