Skip to content

fix: reduce redundant remote_function deployments #856

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 30, 2024
Merged

Conversation

shobsi
Copy link
Contributor

@shobsi shobsi commented Jul 24, 2024

See the demo at screencast/cast/NTA0OTk2MTc5MTA5NDc4NHwwYjBjNDBiMS01Mw

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes internal issue #349872122 🦕

@shobsi shobsi requested review from a team as code owners July 24, 2024 07:06
@shobsi shobsi requested a review from TrevorBergeron July 24, 2024 07:06
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@product-auto-label product-auto-label bot added the size: l Pull request size is large. label Jul 24, 2024
@shobsi shobsi marked this pull request as draft July 24, 2024 07:06
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. label Jul 24, 2024
@shobsi shobsi marked this pull request as ready for review July 26, 2024 22:39
Comment on lines +181 to +184
def_copy = cloudpickle.loads(cloudpickle.dumps(def_))
def_copy.__code__ = def_copy.__code__.replace(
co_filename="bigframes_place_holder_filename"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a more efficient way to do this? Also do we want to replace filename for other code objects in the dependency tree? Seems the most efficient solution would require modifying cloudpickle a bit though

Copy link
Contributor Author

@shobsi shobsi Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reg. more efficient way: the most efficient would be to modify one of the pickling routines to take replacements on a live object, I'll consider this in my effort explained down below.

Reg. replacement in the dependency tree: We are not doing dependency tree yet, I'm looking into the cloudpickle (and the native pickle which it builds on top of) to evaluate what modification we can make. Consider this change as an instant work around to reduce re-deployments in two scenarios:

  1. rerun of a cell in the same session
  2. rerun of a notebook with explicitly named remote function

@shobsi shobsi merged commit cbf2d42 into main Jul 30, 2024
23 checks passed
@shobsi shobsi deleted the shobs-nb-reuse-true branch July 30, 2024 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants