Skip to content

Commit 46c5864

Browse files
authored
1322: Modified manifests to use all-in-one training-operator (#1346)
* 1322: Modified manifests to use all-in-one training-operator WIP Actions taken: - replaced tf-job-operator => training-operator - replaced kubeflow-tfjobs- => kubeflow-training- - moved crds for mxjobs, tgjobs, pytorchjobs and xgboostjobs from config/crd/bases to manifests/base/ and prefixed them with crd_ Ref: #1322 Testing steps: To be added Work in Progress * 1322: synced up config/manager with manifests Training operator was found to be working <pre> k -n kubeflow logs -f training-operator-694766989-pp2j4 I0812 21:43:24.739862 1 request.go:645] Throttling request took 1.048945631s, request: GET:https://172.19.0.1:443/apis/networking.k8s.io/v1?timeout=32s 2021-08-12T21:43:25.694Z INFO controller-runtime.metrics metrics server is starting to listen {"addr": ":8080"} 2021-08-12T21:43:25.790Z INFO setup starting manager 2021-08-12T21:43:25.790Z INFO controller-runtime.manager starting metrics server {"path": "/metrics"} 2021-08-12T21:43:25.790Z INFO controller-runtime.manager.controller.tf-operator Starting EventSource {"source": "kind source: /, Kind="} 2021-08-12T21:43:25.790Z INFO controller-runtime.manager.controller.mxnet-operator Starting EventSource {"source": "kind source: /, Kind="} 2021-08-12T21:43:25.791Z INFO controller-runtime.manager.controller.pytorchjob-operator Starting EventSource {"source": "kind source: /, Kind="} 2021-08-12T21:43:25.791Z INFO controller-runtime.manager.controller.xgboostjob-operator Starting EventSource {"source": "kind source: /, Kind="} 2021-08-12T21:43:26.289Z INFO controller-runtime.manager.controller.xgboostjob-operator Starting EventSource {"source": "kind source: /, Kind="} 2021-08-12T21:43:26.294Z INFO controller-runtime.manager.controller.pytorchjob-operator Starting EventSource {"source": "kind source: /, Kind="} 2021-08-12T21:43:26.589Z INFO controller-runtime.manager.controller.mxnet-operator Starting EventSource {"source": "kind source: /, Kind="} 2021-08-12T21:43:26.688Z INFO controller-runtime.manager.controller.tf-operator Starting EventSource {"source": "kind source: /, Kind="} 2021-08-12T21:43:26.889Z INFO controller-runtime.manager.controller.tf-operator Starting EventSource {"source": "kind source: /, Kind="} 2021-08-12T21:43:26.889Z INFO controller-runtime.manager.controller.pytorchjob-operator Starting EventSource {"source": "kind source: /, Kind="} 2021-08-12T21:43:26.890Z INFO controller-runtime.manager.controller.xgboostjob-operator Starting EventSource {"source": "kind source: /, Kind="} 2021-08-12T21:43:26.890Z INFO controller-runtime.manager.controller.mxnet-operator Starting EventSource {"source": "kind source: /, Kind="} 2021-08-12T21:43:26.990Z INFO controller-runtime.manager.controller.xgboostjob-operator Starting Controller 2021-08-12T21:43:26.990Z INFO controller-runtime.manager.controller.tf-operator Starting Controller 2021-08-12T21:43:26.990Z INFO controller-runtime.manager.controller.tf-operator Starting workers {"worker count": 1} 2021-08-12T21:43:26.990Z INFO controller-runtime.manager.controller.pytorchjob-operator Starting Controller 2021-08-12T21:43:26.991Z INFO controller-runtime.manager.controller.xgboostjob-operator Starting workers {"worker count": 1} 2021-08-12T21:43:26.991Z INFO controller-runtime.manager.controller.pytorchjob-operator Starting workers {"worker count": 1} 2021-08-12T21:43:26.991Z INFO controller-runtime.manager.controller.mxnet-operator Starting Controller 2021-08-12T21:43:26.991Z INFO controller-runtime.manager.controller.mxnet-operator Starting workers {"worker count": 1} </pre> * 1322: incorporated review comments - added all resources in ClusterRole * 1322: incorporated review comments - now controller-gen generates the crds directly in manifests/base instead of config/crd/bases - updated setup-training-operator.sh to use manifests/overlays/standalone * 1322: removed config/crd/bases as its now getting generated in manifests * 1322: incorporated review comments related to using separate role files * 1322: removed image name replacement
1 parent 3e11cde commit 46c5864

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+180
-331
lines changed

Makefile

+1-1
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ help: ## Display this help.
3838
##@ Development
3939

4040
manifests: controller-gen ## Generate WebhookConfiguration, ClusterRole and CustomResourceDefinition objects.
41-
$(CONTROLLER_GEN) $(CRD_OPTIONS) rbac:roleName=manager-role webhook paths="./pkg/apis/..." output:crd:artifacts:config=config/crd/bases
41+
$(CONTROLLER_GEN) $(CRD_OPTIONS) rbac:roleName=manager-role webhook paths="./pkg/apis/..." output:crd:artifacts:config=manifests/base
4242

4343
generate: controller-gen ## Generate code containing DeepCopy, DeepCopyInto, and DeepCopyObject method implementations.
4444
$(CONTROLLER_GEN) object:headerFile="hack/boilerplate.go.txt" paths="./pkg/apis/..."

config/crd/kustomization.yaml

-13
This file was deleted.

config/crd/kustomizeconfig.yaml

-19
This file was deleted.

config/manager/manager.yaml

-57
This file was deleted.

manifests/base/cluster-role-binding.yaml

+4-4
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,12 @@ apiVersion: rbac.authorization.k8s.io/v1beta1
33
kind: ClusterRoleBinding
44
metadata:
55
labels:
6-
app: tf-job-operator
7-
name: tf-job-operator
6+
app: training-operator
7+
name: training-operator
88
roleRef:
99
apiGroup: rbac.authorization.k8s.io
1010
kind: ClusterRole
11-
name: tf-job-operator
11+
name: training-operator
1212
subjects:
1313
- kind: ServiceAccount
14-
name: tf-job-operator
14+
name: training-operator

manifests/base/cluster-role.yaml

+43-96
Original file line numberDiff line numberDiff line change
@@ -3,100 +3,47 @@ apiVersion: rbac.authorization.k8s.io/v1beta1
33
kind: ClusterRole
44
metadata:
55
labels:
6-
app: tf-job-operator
7-
name: tf-job-operator
6+
app: training-operator
7+
name: training-operator
88
rules:
9-
- apiGroups:
10-
- kubeflow.org
11-
resources:
12-
- tfjobs
13-
- tfjobs/status
14-
- tfjobs/finalizers
15-
verbs:
16-
- '*'
17-
- apiGroups:
18-
- apiextensions.k8s.io
19-
resources:
20-
- customresourcedefinitions
21-
verbs:
22-
- '*'
23-
- apiGroups:
24-
- ""
25-
resources:
26-
- pods
27-
- services
28-
- endpoints
29-
- events
30-
verbs:
31-
- '*'
32-
- apiGroups:
33-
- apps
34-
- extensions
35-
resources:
36-
- deployments
37-
verbs:
38-
- '*'
39-
- apiGroups:
40-
- scheduling.volcano.sh
41-
resources:
42-
- podgroups
43-
verbs:
44-
- '*'
45-
46-
---
47-
48-
apiVersion: rbac.authorization.k8s.io/v1
49-
kind: ClusterRole
50-
metadata:
51-
name: kubeflow-tfjobs-admin
52-
labels:
53-
rbac.authorization.kubeflow.org/aggregate-to-kubeflow-admin: "true"
54-
aggregationRule:
55-
clusterRoleSelectors:
56-
- matchLabels:
57-
rbac.authorization.kubeflow.org/aggregate-to-kubeflow-tfjobs-admin: "true"
58-
rules: []
59-
60-
---
61-
62-
apiVersion: rbac.authorization.k8s.io/v1
63-
kind: ClusterRole
64-
metadata:
65-
name: kubeflow-tfjobs-edit
66-
labels:
67-
rbac.authorization.kubeflow.org/aggregate-to-kubeflow-edit: "true"
68-
rbac.authorization.kubeflow.org/aggregate-to-kubeflow-tfjobs-admin: "true"
69-
rules:
70-
- apiGroups:
71-
- kubeflow.org
72-
resources:
73-
- tfjobs
74-
- tfjobs/status
75-
verbs:
76-
- get
77-
- list
78-
- watch
79-
- create
80-
- delete
81-
- deletecollection
82-
- patch
83-
- update
84-
85-
---
86-
87-
apiVersion: rbac.authorization.k8s.io/v1
88-
kind: ClusterRole
89-
metadata:
90-
name: kubeflow-tfjobs-view
91-
labels:
92-
rbac.authorization.kubeflow.org/aggregate-to-kubeflow-view: "true"
93-
rules:
94-
- apiGroups:
95-
- kubeflow.org
96-
resources:
97-
- tfjobs
98-
- tfjobs/status
99-
verbs:
100-
- get
101-
- list
102-
- watch
9+
- apiGroups:
10+
- kubeflow.org
11+
resources:
12+
- tfjobs
13+
- mxjobs
14+
- pytorchjobs
15+
- xgboostjobs
16+
- tfjobs/status
17+
- pytorchjobs/status
18+
- mxjobs/status
19+
- xgboostjobs/status
20+
verbs:
21+
- "*"
22+
- apiGroups:
23+
- apiextensions.k8s.io
24+
resources:
25+
- customresourcedefinitions
26+
verbs:
27+
- "*"
28+
- apiGroups:
29+
- ""
30+
resources:
31+
- pods
32+
- services
33+
- endpoints
34+
- events
35+
verbs:
36+
- "*"
37+
- apiGroups:
38+
- apps
39+
- extensions
40+
resources:
41+
- deployments
42+
verbs:
43+
- "*"
44+
- apiGroups:
45+
- scheduling.volcano.sh
46+
resources:
47+
- podgroups
48+
verbs:
49+
- "*"

manifests/base/crd.yaml

-52
This file was deleted.

manifests/base/deployment.yaml

-29
This file was deleted.

manifests/base/kustomization.yaml

+8-11
Original file line numberDiff line numberDiff line change
@@ -2,14 +2,11 @@ apiVersion: kustomize.config.k8s.io/v1beta1
22
kind: Kustomization
33
namespace: kubeflow
44
resources:
5-
- crd.yaml
6-
- cluster-role-binding.yaml
7-
- cluster-role.yaml
8-
- deployment.yaml
9-
- service-account.yaml
10-
- service.yaml
11-
commonLabels:
12-
app: tf-job-operator
13-
kustomize.component: tf-job-operator
14-
app.kubernetes.io/component: tfjob
15-
app.kubernetes.io/name: tf-job-operator
5+
- kubeflow.org_tfjobs.yaml
6+
- kubeflow.org_mxjobs.yaml
7+
- kubeflow.org_pytorchjobs.yaml
8+
- kubeflow.org_xgboostjobs.yaml
9+
- cluster-role-binding.yaml
10+
- cluster-role.yaml
11+
- service-account.yaml
12+
- service.yaml

manifests/base/service-account.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,5 @@ apiVersion: v1
22
kind: ServiceAccount
33
metadata:
44
labels:
5-
app: tf-job-operator
6-
name: tf-job-operator
5+
app: training-operator
6+
name: training-operator

manifests/base/service.yaml

+3-3
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,13 @@ metadata:
77
prometheus.io/scrape: "true"
88
prometheus.io/port: "8443"
99
labels:
10-
app: tf-job-operator
11-
name: tf-job-operator
10+
app: training-operator
11+
name: training-operator
1212
spec:
1313
ports:
1414
- name: monitoring-port
1515
port: 8443
1616
targetPort: 8443
1717
selector:
18-
name: tf-job-operator
18+
name: training-operator
1919
type: ClusterIP

0 commit comments

Comments
 (0)