Skip to content

[query] Make pyToDF use long lived temp files (again) #14907

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

chrisvittal
Copy link
Collaborator

It is possible that the partition function can outlive the ExecuteContext in TableValue.toDF. Therefore, any temporary files needed for future stages must still be around, so we need to set selfContainedExecution to false for pyToDF.

Fixes #Hail Query 0.2 support > MT to parquet throwing FileNotFoundException: Item not found @ 💬 reported by @ch-kr on Zulip.

Security Assessment

  • This change cannot impact the Hail Batch instance as deployed by Broad Institute in GCP

It is possible that the partition function can outlive the
ExecuteContext in TableValue.toDF. Therefore, any temporary files
needed for future stages must still be around, so we need to set
selfContainedExecution to false for pyToDF.
@chrisvittal chrisvittal changed the title [query] pyToDF must use long lived temp files. [query] Make pyToDF use long lived temp files (again) Jun 3, 2025
Copy link
Contributor

@grohli grohli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing, thanks for finding a fix for this.

Copy link
Collaborator

@patrick-schultz patrick-schultz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! If you happen to know the PR that introduced this bug, it would be nice to record that here. If not, no need to spend time tracking it down.

@hail-ci-robot hail-ci-robot merged commit 841a265 into hail-is:main Jun 5, 2025
2 checks passed
@chrisvittal
Copy link
Collaborator Author

It was the py4j backend extensions one #14767

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants