Fix llm hp optimization error #2576

helenxie-bit · 2025-03-29T05:44:25Z

What this PR does / why we need it:
This PR aims to fix errors when using Katib LLM hyperparameter optimization API—which depends on the Trainer SDK v1.9.0—for running the example in the user guide.

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):
Fixes #2575

Checklist:

Docs included if any changes are user facing

Signed-off-by: helenxie-bit <[email protected]>

helenxie-bit · 2025-03-29T05:49:58Z

Please review when you have time @andreyvelich @mahdikhashan . Thank you!

helenxie-bit · 2025-03-29T06:59:07Z

The E2E test for train API failed due to the following error TypeError: Object of type LoraRuntimeConfig is not JSON serializable. I'm working on fixing it.

Updated 2025-03-31:
I fixed the issue by updating the following line of code:

trainer/sdk/python/kubeflow/training/api/training_client.py

Lines 296 to 298 in 77bd5cd

    
           json.dumps( 
        
               trainer_parameters.lora_config.__dict__, cls=utils.SetEncoder 
        
           ),

to:

json.dumps(trainer_parameters.lora_config.to_dict(), cls=utils.SetEncoder),

This change follows the official documentation, which recommends using LoraConfig.to_dict() for serialization.

@mahdikhashan Can you help test if this fix your issues? Since I remember you've met with the same issue.

Signed-off-by: helenxie-bit <[email protected]>

…46 of 🤗 Transformers. Use instead' Signed-off-by: helenxie-bit <[email protected]>

mahdikhashan · 2025-03-31T20:25:40Z

/assign

andreyvelich · 2025-04-23T16:31:15Z

/rerun-all

mahdikhashan · 2025-04-23T17:14:19Z

sdk/python/kubeflow/trainer/requirements.txt

+datasets==3.5.0
+transformers==4.50.2
+accelerate==1.5.2
+tensorboard==2.19.0


what is the use-case for tensorboard in this changes?

I believe I've met an error which said the version of tensorboard is not correct, so I explicitly set its version here.

mahdikhashan · 2025-04-23T17:15:30Z

@helenxie-bit thank you for this pr and sorry for the delay. code changes seems good to me, let me check if i can run an example from notebook with this changes.

helenxie-bit · 2025-04-24T22:54:02Z

/rerun-all

mahdikhashan · 2025-04-25T18:09:13Z

/lgtm

thank you for your patience

andreyvelich · 2025-04-29T16:12:40Z

Thank you @helenxie-bit!
/lgtm
/approve

google-oss-prow · 2025-04-29T16:12:48Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~sdk/python/OWNERS~~ [andreyvelich]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fix llm hp optimization error

029a58c

Signed-off-by: helenxie-bit <[email protected]>

google-oss-prow bot requested review from jinchihe and kuizhiqing March 29, 2025 05:44

google-oss-prow bot added the size/M label Mar 29, 2025

fix pre-commit error

175acaf

Signed-off-by: helenxie-bit <[email protected]>

helenxie-bit mentioned this pull request Mar 29, 2025

[GSoC] Add e2e test for tune api with LLM hyperparameter optimization kubeflow/katib#2420

Open

1 task

helenxie-bit changed the title ~~fix llm hp optimization error~~ [WIP] fix llm hp optimization error Mar 29, 2025

google-oss-prow bot added the do-not-merge/work-in-progress label Mar 29, 2025

helenxie-bit added 2 commits March 30, 2025 21:17

fix json serialization error

2db9b2b

Signed-off-by: helenxie-bit <[email protected]>

fix warning message ' is deprecated and will be removed in version 4.…

7cd174b

…46 of 🤗 Transformers. Use instead' Signed-off-by: helenxie-bit <[email protected]>

helenxie-bit changed the title ~~[WIP] fix llm hp optimization error~~ Fix llm hp optimization error Mar 31, 2025

google-oss-prow bot removed the do-not-merge/work-in-progress label Mar 31, 2025

google-oss-prow bot assigned mahdikhashan Mar 31, 2025

mahdikhashan reviewed Apr 23, 2025

View reviewed changes

google-oss-prow bot added the lgtm label Apr 25, 2025

google-oss-prow bot assigned andreyvelich Apr 29, 2025

google-oss-prow bot added the approved label Apr 29, 2025

google-oss-prow bot merged commit f58e893 into kubeflow:release-1.9 Apr 29, 2025
53 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix llm hp optimization error #2576

Fix llm hp optimization error #2576

Uh oh!

helenxie-bit commented Mar 29, 2025

Uh oh!

helenxie-bit commented Mar 29, 2025

Uh oh!

helenxie-bit commented Mar 29, 2025 •

edited

Loading

Uh oh!

mahdikhashan commented Mar 31, 2025

Uh oh!

andreyvelich commented Apr 23, 2025

Uh oh!

mahdikhashan Apr 23, 2025

Uh oh!

helenxie-bit Apr 24, 2025

Uh oh!

mahdikhashan Apr 25, 2025

Uh oh!

mahdikhashan commented Apr 23, 2025 •

edited

Loading

Uh oh!

helenxie-bit commented Apr 24, 2025

Uh oh!

mahdikhashan commented Apr 25, 2025

Uh oh!

andreyvelich commented Apr 29, 2025

Uh oh!

google-oss-prow bot commented Apr 29, 2025

Uh oh!

Uh oh!

Uh oh!

Fix llm hp optimization error #2576

Fix llm hp optimization error #2576

Uh oh!

Conversation

helenxie-bit commented Mar 29, 2025

Uh oh!

helenxie-bit commented Mar 29, 2025

Uh oh!

helenxie-bit commented Mar 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mahdikhashan commented Mar 31, 2025

Uh oh!

andreyvelich commented Apr 23, 2025

Uh oh!

mahdikhashan Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

helenxie-bit Apr 24, 2025

Choose a reason for hiding this comment

Uh oh!

mahdikhashan Apr 25, 2025

Choose a reason for hiding this comment

Uh oh!

mahdikhashan commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

helenxie-bit commented Apr 24, 2025

Uh oh!

mahdikhashan commented Apr 25, 2025

Uh oh!

andreyvelich commented Apr 29, 2025

Uh oh!

google-oss-prow bot commented Apr 29, 2025

Uh oh!

Uh oh!

Uh oh!

helenxie-bit commented Mar 29, 2025 •

edited

Loading

mahdikhashan commented Apr 23, 2025 •

edited

Loading