adapters/kfp: support distributed training #109

d4l3k · 2021-07-26T21:10:02Z

This adds a new resource_to_app KFP adapter that allows adapting an app to a kfp ResourceOp that launches the operator using the Volcano scheduler. This reuses the same code that creates the resources for the kubernetes scheduler and embeds the resource inside a KFP pipeline.

This isn't supported under KFP v2 since it interacts directly with kubernetes resources/volcano. This also requires volcano to be installed on the cluster to use which is why it's a new adapter instead of automatically being used.

This is still fairly experimental and once KFP has better distributed support we likely want to rely on that instead since this has some less than ideal UX. You need to use the CLI to access the individual worker logs and there isn't any support for UI metadata yet.

UI metadata I think can be added by providing an output annotation for argo as part of the resource but I haven't looked into it.

Test plan:

pyre
pytest
python dist_pipeline.py

http://5ab6bab9-istiosystem-istio-2af2-1926929629.us-west-2.elb.amazonaws.com/_/pipeline/#/runs/details/27707de9-bc67-42da-ab86-af2127ee54d1

facebook-github-bot · 2021-07-26T22:00:49Z

@d4l3k has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: This adds a new `resource_to_app` KFP adapter that allows adapting an app to a kfp ResourceOp that launches the operator using the Volcano scheduler. This reuses the same code that creates the resources for the kubernetes scheduler and embeds the resource inside a KFP pipeline. This isn't supported under KFP v2 since it interacts directly with kubernetes resources/volcano. This also requires volcano to be installed on the cluster to use which is why it's a new adapter instead of automatically being used. This is still fairly experimental and once KFP has better distributed support we likely want to rely on that instead since this has some less than ideal UX. You need to use the CLI to access the individual worker logs and there isn't any support for UI metadata yet. UI metadata I think can be added by providing an output annotation for argo as part of the resource but I haven't looked into it. Pull Request resolved: #109 Test Plan: ``` pyre pytest python dist_pipeline.py ``` http://5ab6bab9-istiosystem-istio-2af2-1926929629.us-west-2.elb.amazonaws.com/_/pipeline/#/runs/details/27707de9-bc67-42da-ab86-af2127ee54d1 ![20210726_14h12m04s_grim](https://user-images.githubusercontent.com/909104/127059928-b4787429-e895-4b97-b53e-c6262e99c52b.png) Reviewed By: kiukchung Differential Revision: D29921246 Pulled By: d4l3k fbshipit-source-id: b23c8ea376cb25b4b6fa3e7208c120ec783d750a

facebook-github-bot · 2021-07-27T19:36:30Z

This pull request was exported from Phabricator. Differential Revision: D29921246

facebook-github-bot · 2021-07-27T20:47:37Z

@d4l3k merged this pull request in f6907e8.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 26, 2021

d4l3k force-pushed the k8s-logs branch from 53d3c7c to d56a20f Compare July 26, 2021 21:37

d4l3k force-pushed the k8s-logs branch from d56a20f to 8333d2e Compare July 27, 2021 19:36

facebook-github-bot closed this in f6907e8 Jul 27, 2021

facebook-github-bot added the Merged label Jul 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

adapters/kfp: support distributed training #109

adapters/kfp: support distributed training #109

Uh oh!

d4l3k commented Jul 26, 2021 •

edited

Loading

Uh oh!

facebook-github-bot commented Jul 26, 2021

Uh oh!

facebook-github-bot commented Jul 27, 2021

Uh oh!

facebook-github-bot commented Jul 27, 2021

Uh oh!

Uh oh!

adapters/kfp: support distributed training #109

adapters/kfp: support distributed training #109

Uh oh!

Conversation

d4l3k commented Jul 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Jul 26, 2021

Uh oh!

facebook-github-bot commented Jul 27, 2021

Uh oh!

facebook-github-bot commented Jul 27, 2021

Uh oh!

Uh oh!

d4l3k commented Jul 26, 2021 •

edited

Loading