Skip to content

Commit 61accd1

Browse files
authored
added initial structure for docs (kubernetes-sigs#57)
1 parent 2daf75c commit 61accd1

File tree

7 files changed

+193
-22
lines changed

7 files changed

+193
-22
lines changed

README.md

+5-17
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,21 @@
11
# JobSet
22

3-
JobSet: An API for managing a group of Jobs as a unit.
3+
JobSet is a Kubernetes-native API for managing a group of [k8s Jobs](https://kubernetes.io/docs/concepts/workloads/controllers/job/) as a unit. It aims to offer a unified API for deploying HPC (e.g., MPI) and AI/ML training workloads (PyTorch, Jax, Tensorflow etc.) on Kubernetes.
44

5-
# Installation
65

7-
### Prerequisites
8-
[cert-manager](https://cert-manager.io/) is required to create certificates for the webhook. To install
9-
it on your cluster, run the following command:
10-
```
11-
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.11.0/cert-manager.yaml
12-
```
13-
See more details about [cert-manager installation](https://cert-manager.io/docs/installation/).
6+
## Installation
147

15-
To install the JobSet CRD and deploy the controller on the cluster selected on your `~/.kubeconfig`, run the following commands:
16-
```
17-
git clone https://github.com/kubernetes-sigs/jobset.git
18-
cd jobset
8+
Read the [installation guide](/docs/setup/install.md) to learn more.
199

20-
IMAGE_REGISTRY=<registry>/<project> make image-push deploy
21-
```
2210

2311
## Community, discussion, contribution, and support
2412

2513
Learn how to engage with the Kubernetes community on the [community page](http://kubernetes.io/community/).
2614

2715
You can reach the maintainers of this project at:
2816

29-
- [Slack](https://kubernetes.slack.com/messages/sig-apps)
30-
- [Mailing List](https://groups.google.com/forum/#!forum/kubernetes-sig-apps)
17+
- [Slack](https://kubernetes.slack.com/messages/wg-batch)
18+
- [Mailing List](https://groups.google.com/a/kubernetes.io/g/wg-batch)
3119

3220
### Code of conduct
3321

cloudbuild.yaml

+1-4
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,3 @@
1-
2-
3-
41
# See https://cloud.google.com/cloud-build/docs/build-config
52
timeout: 3000s
63
# A build step specifies an action that you want Prow to perform.
@@ -24,4 +21,4 @@ substitutions:
2421
# a branch like 'main' or 'release-0.2', or a tag like 'v0.2'.
2522
_PULL_BASE_REF: 'main'
2623
options:
27-
substitution_option: ALLOW_LOOSE
24+
substitution_option: ALLOW_LOOSE

docs/README.md

+23-1
Original file line numberDiff line numberDiff line change
@@ -1 +1,23 @@
1-
placeholder
1+
# JobSet documentation
2+
3+
Welcome to JobSet!
4+
JobSet is a Kubernetes native API for running batch, HPC and AI/ML training workloads.
5+
6+
## Understand JobSet
7+
8+
Learn about JobSet and its fundamental concepts.
9+
10+
[**View concepts**](concepts)
11+
12+
## Setup
13+
14+
Follow step-by-step instructions on how to get JobSet running on your cluster.
15+
16+
[**View Setup**](setup)
17+
18+
## Learn how to use JobSet
19+
20+
Look up common tasks and how to perform them using a short sequence
21+
of steps.
22+
23+
[**View tasks**](tasks)

docs/concepts/README.md

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# Concepts
2+
3+
This section of the documentation helps you learn about the JobSet API and its semantics.
4+

docs/setup/README.md

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Setup
2+
3+
This section shows you how to deploy JobSet in your cluster.
4+
5+
* [Installation](install.md)

docs/setup/install.md

+151
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
# Installation
2+
3+
## Before you begin
4+
5+
Make sure the following conditions are met:
6+
7+
- A Kubernetes cluster with version 1.21 or newer is running. Learn how to [install the Kubernetes tools](https://kubernetes.io/docs/tasks/tools/).
8+
- The `SuspendJob` [feature gate][feature_gate] is enabled. In Kubernetes 1.22 or newer, the feature gate is enabled by default and reached stable in Kubernetes 1.24.
9+
- The kubectl command-line tool has communication with your cluster.
10+
11+
<!-- Uncomment once jobset publishes metrics -->
12+
<!-- JobSet publishes [metrics](/docs/reference/metrics) to monitor its operators. -->
13+
<!-- You can scrape these metrics with Prometheus. -->
14+
<!-- Use [kube-prometheus](https://github.com/prometheus-operator/kube-prometheus) -->
15+
<!-- if you don't have your own monitoring system. -->
16+
17+
<!-- The webhook server in JobSet uses an internal cert management for provisioning certificates. If you want to use -->
18+
<!-- a third-party one, e.g. [cert-manager](https://github.com/cert-manager/cert-manager), follow these steps: -->
19+
<!-- 1. Set `internalCertManagement.enable` to `false` in [config file](#install-a-custom-configured-released-version). -->
20+
<!-- 2. Comment out the `internalcert` folder in `config/default/kustomization.yaml`. -->
21+
<!-- 3. Enable `cert-manager` in `config/default/kustomization.yaml` and uncomment all sections with 'CERTMANAGER'. -->
22+
23+
[feature_gate]: https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/
24+
25+
26+
<!-- Uncomment once we release the first version -->
27+
<!-- ## Install a released version -->
28+
29+
<!-- To install a released version of Jobset in your cluster, run the following command: -->
30+
31+
<!-- ```shell -->
32+
<!-- VERSION=v0.1.0 -->
33+
<!-- kubectl apply -f https://github.com/kubernetes-sigs/jobset/releases/download/$VERSION/manifests.yaml -->
34+
<!-- ``` -->
35+
36+
<!-- <\!-- Uncomment once we have a prometheus setup -\-> -->
37+
<!-- <\!-- ### Add metrics scraping for prometheus-operator -\-> -->
38+
39+
<!-- <\!-- _Available in JobSet v0.2.1 and later_ -\-> -->
40+
41+
<!-- <\!-- To allow [prometheus-operator](https://github.com/prometheus-operator/prometheus-operator) -\-> -->
42+
<!-- <\!-- to scrape metrics from jobset components, run the following command: -\-> -->
43+
44+
<!-- <\!-- ```shell -\-> -->
45+
<!-- <\!-- kubectl apply -f https://github.com/kubernetes-sigs/jobset/releases/download/$VERSION/prometheus.yaml -\-> -->
46+
<!-- ``` -->
47+
48+
<!-- ### Uninstall -->
49+
50+
<!-- To uninstall a released version of JobSet from your cluster, run the following command: -->
51+
52+
<!-- ```shell -->
53+
<!-- VERSION=v0.1.0 -->
54+
<!-- kubectl delete -f https://github.com/kubernetes-sigs/jobset/releases/download/$VERSION/manifests.yaml -->
55+
<!-- ``` -->
56+
57+
<!-- <\!-- Uncomment once we have component config setup -\-> -->
58+
<!-- <\!-- ## Install a custom-configured released version -\-> -->
59+
60+
<!-- <\!-- To install a custom-configured released version of JobSet in your cluster, execute the following steps: -\-> -->
61+
62+
<!-- <\!-- 1. Download the release's `manifests.yaml` file: -\-> -->
63+
64+
<!-- <\!-- ```shell -\-> -->
65+
<!-- <\!-- VERSION=v0.1.0 -\-> -->
66+
<!-- <\!-- wget https://github.com/kubernetes-sigs/jobset/releases/download/$VERSION/manifests.yaml -\-> -->
67+
<!-- <\!-- ``` -\-> -->
68+
<!-- <\!-- 2. With an editor of your preference, open `manifests.yaml`. -\-> -->
69+
<!-- <\!-- 3. In the `jobset-manager-config` ConfigMap manifest, edit the -\-> -->
70+
<!-- <\!-- `controller_manager_config.yaml` data entry. The entry represents -\-> -->
71+
<!-- <\!-- the default JobSet Configuration -\-> -->
72+
<!-- <\!-- struct ([[email protected]](https://pkg.go.dev/sigs.k8s.io/[email protected]/apis/config/v1alpha1#Configuration)). -\-> -->
73+
<!-- <\!-- The contents of the ConfigMap are similar to the following: -\-> -->
74+
75+
76+
<!-- <\!-- ```yaml -\-> -->
77+
<!-- <\!-- apiVersion: v1 -\-> -->
78+
<!-- <\!-- kind: ConfigMap -\-> -->
79+
<!-- <\!-- metadata: -\-> -->
80+
<!-- <\!-- name: jobset-manager-config -\-> -->
81+
<!-- <\!-- namespace: jobset-system -\-> -->
82+
<!-- <\!-- data: -\-> -->
83+
<!-- <\!-- controller_manager_config.yaml: | -\-> -->
84+
<!-- <\!-- apiVersion: config.jobset.x-k8s.io/v1alpha1 -\-> -->
85+
<!-- <\!-- kind: Configuration -\-> -->
86+
<!-- <\!-- namespace: jobset-system -\-> -->
87+
<!-- <\!-- health: -\-> -->
88+
<!-- <\!-- healthProbeBindAddress: :8081 -\-> -->
89+
<!-- <\!-- metrics: -\-> -->
90+
<!-- <\!-- bindAddress: :8080 -\-> -->
91+
<!-- <\!-- webhook: -\-> -->
92+
<!-- <\!-- port: 9443 -\-> -->
93+
<!-- <\!-- internalCertManagement: -\-> -->
94+
<!-- <\!-- enable: true -\-> -->
95+
<!-- <\!-- webhookServiceName: jobset-webhook-service -\-> -->
96+
<!-- <\!-- webhookSecretName: jobset-webhook-server-cert -\-> -->
97+
<!-- <\!-- ``` -\-> -->
98+
99+
<!-- <\!-- 3. Apply the customized manifests to the cluster: -\-> -->
100+
101+
<!-- <\!-- ```shell -\-> -->
102+
<!-- <\!-- kubectl apply -f manifests.yaml -\-> -->
103+
<!-- <\!-- ``` -\-> -->
104+
105+
## Install the latest development version
106+
107+
To install the latest development version of Jobset in your cluster, run the
108+
following command:
109+
110+
```shell
111+
kubectl apply -k github.com/kubernetes-sigs/jobset/config/default?ref=main
112+
```
113+
114+
The controller runs in the `jobset-system` namespace.
115+
116+
### Uninstall
117+
118+
To uninstall JobSet, run the following command:
119+
120+
```shell
121+
kubectl delete -k github.com/kubernetes-sigs/jobset/config/default
122+
```
123+
124+
## Build and install from source
125+
126+
To build Jobset from source and install Jobset in your cluster, run the following
127+
commands:
128+
129+
```sh
130+
git clone https://github.com/kubernetes-sigs/jobset.git
131+
cd jobset
132+
IMAGE_REGISTRY=<registry>/<project> make image-push deploy
133+
```
134+
135+
<!-- Uncomment once we have a prometheus setup -->
136+
<!-- ### Add metrics scraping for prometheus-operator -->
137+
138+
<!-- To allow [prometheus-operator](https://github.com/prometheus-operator/prometheus-operator) -->
139+
<!-- to scrape metrics from jobset components, run the following command: -->
140+
141+
<!-- ```shell -->
142+
<!-- make prometheus -->
143+
<!-- ``` -->
144+
145+
### Uninstall
146+
147+
To uninstall JobSet, run the following command:
148+
149+
```sh
150+
make undeploy
151+
```

docs/tasks/README.md

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# Tasks
2+
3+
The following tasks show you how to perform common tasks with the JobSet API.
4+

0 commit comments

Comments
 (0)