-
Notifications
You must be signed in to change notification settings - Fork 205
"spark.sql.sources.partitionOverwriteMode": "DYNAMIC" - creates additional tables #1314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @MichalBogoryja, what's the connector version you are using? Please try with the latest connector version |
Hi @isha97, I ran a test under similar conditions to those described by @MichalBogoryja, and the issue is still present in version |
Hi @isha97, Can you modify the cleanup process? I guess the most efficient way would be to trigger the cleanup process just after the writing to bq partitioned table finishes. |
Hi @isha97, @davidrabinowitz |
When writing a spark dataframe to an existing partitioned BQ table I end up with the table modified in an expected way (partition added/modified). However, the additional table is being saved (it consists of the exact data of the dataframe that I was adding to the other table).
To reproduce:
database state: empty
from pyspark.sql import SparkSession
spark = SparkSession.builder.config("spark.sql.sources.partitionOverwriteMode", "DYNAMIC").config("enableReadSessionCaching", "false").getOrCreate()
spark
sdf.write.format("bigquery").option('partitionField', 'curdate').option('partitionType', 'DAY').mode('overwrite').save(f"{gcp_project_id}.{db}.{table_name}")
database state:
one table named {table_name} - data as in sdf
sdf_2.write.format("bigquery").mode('overwrite').save(f"{gcp_project_id}.{db}.{table_name}")
database state:
one table named {table_name} - data as in sdf with new data from sdf_2 (or if sdf_2 consists of the same partitions as there were in sdf, the original partitions are overwritten)
ADDITIONAL table named {table_name}random_numbers (eg. table_name4467706876500)
Can you modify the saving function to not save this additional table (or drop it after the save process)?
The text was updated successfully, but these errors were encountered: