-
Notifications
You must be signed in to change notification settings - Fork 778
Fix llm hp optimization error #2576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix llm hp optimization error #2576
Conversation
Signed-off-by: helenxie-bit <[email protected]>
Signed-off-by: helenxie-bit <[email protected]>
Please review when you have time @andreyvelich @mahdikhashan . Thank you! |
The E2E test for train API failed due to the following error Updated 2025-03-31: trainer/sdk/python/kubeflow/training/api/training_client.py Lines 296 to 298 in 77bd5cd
to:
This change follows the official documentation, which recommends using @mahdikhashan Can you help test if this fix your issues? Since I remember you've met with the same issue. |
Signed-off-by: helenxie-bit <[email protected]>
…46 of 🤗 Transformers. Use instead' Signed-off-by: helenxie-bit <[email protected]>
/assign |
/rerun-all |
datasets==3.5.0 | ||
transformers==4.50.2 | ||
accelerate==1.5.2 | ||
tensorboard==2.19.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the use-case for tensorboard in this changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I've met an error which said the version of tensorboard
is not correct, so I explicitly set its version here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you
@helenxie-bit thank you for this pr and sorry for the delay. code changes seems good to me, let me check if i can run an example from notebook with this changes. |
/rerun-all |
/lgtm thank you for your patience |
Thank you @helenxie-bit! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andreyvelich The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
This PR aims to fix errors when using Katib LLM hyperparameter optimization API—which depends on the Trainer SDK v1.9.0—for running the example in the user guide.
Which issue(s) this PR fixes (optional, in
Fixes #<issue number>, #<issue number>, ...
format, will close the issue(s) when PR gets merged):Fixes #2575
Checklist: