-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Add VolcanoWorker to support Volcano Job submission via Kubernetes work pool and test #17909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
fb34ccc
to
5c10a28
Compare
CodSpeed Performance ReportMerging #17909 will not alter performanceComparing Summary
|
5c10a28
to
0066f99
Compare
hi,@reneleonhardt @zzstoatzz (we have argued before) |
I'm no maintainer sorry. I suggest using a free AI to save contributors and reviewers time: |
Hey @zyn71, neither @zzstoatzz nor I are very familiar with Volcano, and we'll want to test this functionality before merging. Could you provide instructions on how we can test these changes locally? |
Hi @desertaxle — thank you for the quick feedback! Below is a full, single-file guide my team follows to test the new VolcanoWorker locally. Everything has been validated on real clusters; 1 Get server and workpoolCreate a conventional prefect server and prefect-kubernetes workpool.So far nothing is Volcano-specific. 2 Convert K8s work pool to “Volcano mode” manullyOpen K8s Work Pool → Advanced → Job Manifest and paste the manifest below. apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
labels: "{{ labels }}"
namespace: "{{ namespace }}"
generateName: "{{ name }}-"
spec:
schedulerName: volcano
queue: your-volcano-queue
maxRetry: 3
minAvailable: 1
tasks:
- name: pod1
replicas: 1
policies:
- event: TaskCompleted
action: CompleteJob
template:
metadata:
labels:
job-name: "{{ name }}"
volcano.sh/job-name: "{{ name }}"
spec:
restartPolicy: Never
containers:
- name: c1
image: "{{ image }}"
command: "{{ command }}"
env: "{{ env }}"
imagePullPolicy: "{{ image_pull_policy }}" 3 Start a VolcanoWorkerBefore merging into main branch, we must build a new image including above there changes files.(BASE is the newest version prefect) docker build -t your_image . We can uses helm pull prefect/prefect-worker # fetch chart
tar -xzf prefect-worker-*.tgz && cd prefect-worker
# edit values.yaml (connect to prefect server by your owns)➜
config:
type: volcano
workPool:"your work pool"
image:
repository: your image
prefectTag: your tag
helm install prefect-worker . -f values.yaml -n your-namespace No extra image build is needed after merging 4 Deploy & run a flowAny deployment assigned to Extra notes
Thanks again for your time! |
Thanks for that guide @zyn71! Do I need to install Volcano in my cluster before using the Volcano worker? Also, I think it'd be best if this worker didn't require manual changes to the base job template in the default case. Would you be able to add |
hi again @desertaxle !Yes, Volcano must be installed in the cluster before the Regarding your concern about avoiding manual edits to the base job template: Prefect validates the job manifest in several layers, not only inside the worker. Making the worker auto‑switch templates would therefore require extensive changes—perhaps even a brand‑new integration. Considering that our goal is just to extend prefect‑kubernetes with Volcano support, and since If there is anything else you need, please let me know. Thank you! |
@zyn71 I disagree that it would take extensive updates to support a different default manifest for the Volcano worker. The primary validation of the manifest happens in I think one thing that would help is that the Volcano worker doesn't need to be compatible with vanilla Kubernetes jobs. I think the overall design of the new worker would be greatly improved if you dropped that compatibility. |
hi @desertaxle !I totally understand your point—the first thing I tried was exactly what you’re suggesting. If we want to eliminate all manual edits, we would first need to introduce a brand‑new
Because Volcano can sit on top of standard Kubernetes primitives, my goal was to reuse |
@zyn71 we have processes in place to add new supported work pools to the UI, so it will be possible to create a Volcano worker from the UI once the Volcano worker is added to I appreciate your desire not to introduce new churn to the codebase, but I don't think you'll introduce much churn since you're adding something new. If the Volcano worker is going to share code with the Kubernetes worker, I'd like them to be as loosely coupled as possible. One way you can do that is by ensuring the Volcano worker is only responsible for running Volcano jobs. |
Hi @desertaxle! Thanks a lot for the suggestions and guidance. I’ve now added a Following your advice, once the UI can create a Volcano work‑pool (with itsown default job manifest), this worker will be much more self‑contained andloosely coupled with the existing Kubernetes worker. Could you take a look and let me know if this approach works for you? I’m happy to make further adjustments if needed. Thanks! |
Hey @zyn71, we made some pretty significant changes to the Kubernetes worker in #18004 that will require changes to the worker you're working on here. The big change is splitting event emission and crash detection responsibilities into a separate observer that starts alongside the worker. You might be able to reuse the observer, but I'm not sure if the jobs event handler will work with Volanco jobs. Let me know if you have any questions on the new structure! |
hi! @desertaxle,Sorry for the delayed response - I've been quite busy lately and this took longer than expected.
|
Hi! Our MR hasn’t made any progress in a week. Do you have any suggestions, or is there anything else I can do to move it forward or improve it? |
hi @zyn71 - sorry for the delay here. desertaxle is out on leave right now. one of us will review your updates as soon as possible |
Apologies for the noise on the earlier PR #17880.
I am sry While iterating on the Volcano worker I accidentally pushed several unrelated commits that polluted the diff.
I’ve started fresh from the latest upstream/main, cherry-picked only the relevant changes, and added a focused unit-test suite.
The previous PR has been closed in favor of this clean one.
Changes included here (3 files):
volcanoworker.py – new VolcanoWorker implementation
init.py – registers the new worker type
tests/test_volcano_worker.py – unit & smoke tests covering job creation, pod discovery, and state polling
Please let me know if you’d like additional work