[GPU] Updated Kops GPU Setup Hook #4971

dcwangmit01 · 2018-04-11T19:47:35Z

Docker image for testing located at: dcwangmit01/aws-nvidia-bootstrap:0.1.1

Compatible with instructions here: https://github.com/kubernetes/kops/blob/master/docs/gpu.md

===

Changed Dockerfile base image to debian for systemctl and bash.
Added autodetect of AWS ec2 instanceclass p2, p3, g3.
For each detected instance class, added the installation of the proper driver
according to the specific NVIDIA hardware.
- G3 instance types require Nvidia Grid Series/Grid K520 drivers
- P2 instance types require Nvidia Tesla K-Series drivers
- P3 instance types require Nvidia Tesla V-Series drivers
Set custom nvidia-smi configurations according to nvidia hardware per ec2
instanceclass, according to the AWS GPU optimization document.
Added the installation and patches of the latest cuda 9.1 libraries.
Added restart of kubelet on kube node at end of successful hook run, thereby
fixing a race condition where kubelet would start before the Nvidia drivers
were loaded, thus not allowing kubernetes to detect GPUS on the kube node.
Ensured build of nvidia drivers used same gcc version as that which built
default kops kernel.
Fixed issue where every run of this container would download all the NVIDIA
drivers + cuda libs (1GB+), by caching the files on the kube node.
Fixed issue where after reboot, subsequent runs of this script would fail
because mknod would try to create a previously-created device node and fail.
This previously caused download loop as systemd perpetually restarted the
unit upon failure.
Tested with p2.xlarge, p3.2xlarge, and g3.4xlarge

* Changed Dockerfile base image to debian for systemctl and bash. * Added autodetect of AWS ec2 instanceclass p2, p3, g3. * For each detected instance class, added the installation of the proper driver according to the specific NVIDIA hardware. - G3 instance types require Nvidia Grid Series/Grid K520 drivers - P2 instance types require Nvidia Tesla K-Series drivers - P3 instance types require Nvidia Tesla V-Series drivers * Set custom nvidia-smi configurations according to nvidia hardware per ec2 instanceclass, according to the AWS GPU optimization document. * Added the installation and patches of the latest cuda 9.1 libraries. * Added restart of kubelet on kube node at end of successful hook run, thereby fixing a race condition where kubelet would start before the Nvidia drivers were loaded, thus not allowing kubernetes to detect GPUS on the kube node. * Ensured build of nvidia drivers used same gcc version as that which built default kops kernel. * Fixed issue where *every* run of this container would download all the NVIDIA drivers + cuda libs (1GB+), by caching the files on the kube node. * Fixed issue where after reboot, subsequent runs of this script would fail because mknod would try to create a previously-created device node and fail. This previously caused download loop as systemd perpetually restarted the unit upon failure. * Tested with p2.xlarge, p3.2xlarge, and g3.4xlarge

dcwangmit01 · 2018-04-11T19:51:28Z

/assign @KashifSaadat

rtluckie · 2018-04-11T21:28:22Z

hooks/nvidia-bootstrap/image/run.sh

+      chroot ${ROOTFS_DIR} $filepath_host --accept-eula --silent
+      touch $filepath_installed # Mark successful installation
+    else
+      echo "Unable to handle file $filepath_host"


Should this exit 1?

Fixed. New docker image tag at: dcwangmit01/aws-nvidia-bootstrap:0.1.1

bhack · 2018-04-12T01:47:58Z

hooks/nvidia-bootstrap/image/run.sh

+    echo "Installing file $filename on host"
+    if [[ $download =~ .*NVIDIA.* ]]; then
+      # Install the nvidia package (using gcc-7)
+      chroot ${ROOTFS_DIR} /bin/bash -c "CC=/usr/bin/gcc-7 $filepath_host --accept-license --silent"


Are these options (with the default directory) compatible with deviceplugins

Also for the deviceplugins this is the Google approach (alternative to the Nvidia one) to prepare the node

See also https://github.com/GoogleCloudPlatform/container-engine-accelerators/tree/master/cmd/nvidia_gpu

This patch does not take the deviceplugin approach, but it also does not preclude it. It does not install nvidia-docker, and it does not swap the default container runtime. Those could be added on top if we wanted to implement the deviceplugins over the continuation of the existing method. That could be a pull request on top.

The linked PR is interesting because it installs drivers via a daemonset. If one didn't mind running containers on Kubernetes in privileged mode, it would be an interesting alternative to kops hooks. It's also nice because one could deploy via helm chart rather than editing kops instancegroup manifests.

I did take a look at the Google setup script, hoping to ditch what I just wrote in this PR. Unfortunately, just like this current PR, the setup instructions are cloud specific.

The issue is that accellerator is already deprecated.

kubernetes/kubernetes#61498

That is a fair point, and a fine issue to address in a follow-on PR. The changes here are still dependencies to make deviceplugins work. Consider it a solid step in that direction. Hopefully someone can take it the last mile, in a logical follow-on PR.

Also, note that without these changes GPUs on AWS P3 and G3 instances don't work with kops today on 1.8, 1.9, or 1.10 (released 2 weeks ago).

The question we have to ask ourselves is does this PR move the ball toward the goal line. If not, then no worries.

Yes for now just to care if there is some particular directories setup needed as a parameter by the Nvidia installer that could help deviceplugins support. If you check I think that the Google solution it is trying to pass good options to the Nvidia installer.

bhack · 2018-04-26T10:15:43Z

/cc @RenaudWasTaken Any feedback?

RenaudWasTaken · 2018-04-26T14:16:25Z

Also adding @flx42, will comment as soon as I have some time :)

rrtaylor · 2018-05-10T14:47:44Z

I'm testing this new setup using P2 instances with HorizontalPodAutoscaler and cluster-autoscaler to test dynamically scaling GPU nodes. I'm seeing that when a new instance is initialized, the container I'm using to run a GPU process (tensorflow serving) starts before the setup hook finishes running and does not use the GPU (it does not fail, it just uses CPUs). Is there a way to stop pods from running until the setup hook finishes? Or should this problem be handled via adding to the container CMD a script that waits or fails until the GPU is available?

dcwangmit01 · 2018-05-10T19:39:04Z

@richardbrks Thanks for testing out the PR. I hope it works for you.

Regarding your question:

Is there a way to stop pods from running until the setup hook finishes?

Yes, there is a way. Be sure you are setting a gpu-limit in your pod spec.

    resources:
      limits:
        alpha.kubernetes.io/nvidia-gpu: 1

Do a "kubectl describe nodes" and look under Capacity for any node. If you have set things up correctly you should see for nodes that don't have GPU and for nodes that have GPU but has not had the hook finish running, the following capacity.

alpha.kubernetes.io/nvidia-gpu: 0

At the end of the hook run the kubelet is restarted. Only at this point does the Capacity get updated to:

alpha.kubernetes.io/nvidia-gpu: 1

At this point, only if you have set the nvidia-gpu limit in your pod specification, will any pod tagged with such limit start running on the node.

I actually had the inverse problem of non-gpu pods running on the GPU machines. This was easily taken care of by taints and tolerations.

rrtaylor · 2018-05-10T21:07:46Z

@dcwangmit01 that worked! Thanks for your help (and for this PR)!

115100 · 2018-05-11T14:03:28Z

hooks/nvidia-bootstrap/image/run.sh

+# AWS Instance Types to Nvidia Card Mapping (cut and pasted from AWS docs)
+# Load the correct driver for the correct instance type
+#   Instances  Product Type  Product Series  Product
+#   G2         GRID          GRID Series     GRID K520   <-- I think they meant G3


This is correct, but I think G2 is a deprecated instance type. It no longer appears on the pricing page, though we have our own G2 instances running.

According to https://aws.amazon.com/ec2/instance-types/ and https://aws.amazon.com/ec2/instance-types/g3/, G3 instances are based on Tesla M60s.

@115100 Thanks for the clarifications.

So, based on what you said:

We don't have to worry about G2 instances because one cannot spin them up because of deprecation (am I wrong?)

The driver package for the G3 instance is suboptimal but working (I tested it). I'm looking at the Nvidia website right now and Tesla M60 hardware resolves to the following driver, which matches that of P2 and P3.

http://us.download.nvidia.com/tesla/390.46/NVIDIA-Linux-x86_64-390.46.run

I can make the doc change and driver swap later after more feedback, and if we think this PR has any chance of merging.

G2 instances can still be launched but Nvidia's own drivers don't support kernels >=4.9 - I had to write my own patch to get the driver to work. Personally, I think it's safe to ignore G2.

On G3, I don't have any instances up to test them but the rest of the PR looks good to me.

Nvidia's own drivers don't support kernels >=4.9

That's not true. You didn't pick the right driver package then.

Ah, I see a new set of drivers. It was a while back and it did take a while to support the newer kernel though.

rohitagarwal003 · 2018-05-11T21:13:09Z

alpha.kubernetes.io/nvidia-gpu

Please don't use this resource anymore. It was deprecated in 1.10 (kubernetes/kubernetes#57384) and is getting removed in 1.11 (kubernetes/kubernetes#61498).

Use device plugins that introduce nvidia.com/gpu as the resource. See the documentation: https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/

I actually had the inverse problem of non-gpu pods running on the GPU machines. This was easily taken care of by taints and tolerations.

Yes. Taints and Tolerations is the right approach for this. If you use device plugins and nvidia.com/gpu, you can use the ExtendedResourceToleration admission controller.

chrislovecnm

A couple of questions

chrislovecnm · 2018-05-12T18:12:55Z

hooks/nvidia-bootstrap/image/Dockerfile

@@ -12,9 +12,12 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-FROM alpine:3.6
+FROM debian:jessie


Why the switch? We should probably use the base k8s container ... see protokube

nevermind I see why you switched. We should probably use the same container as protokube.

Must have or should have? The difference is I've already got a lot of mileage on the existing base image.

We're switching away from alpine generally, AIUI, so +1 to debian.

chrislovecnm · 2018-05-12T18:17:30Z

/ok-to-test

/assign @rdrgmnzs @mikesplain

Can I get a review but another bash hacker?

Any comments from anyone else??

mikesplain

From the bash perspective this looks like some good improvements. Good organization.

/lgtm

bhack · 2018-05-16T12:46:25Z

Could we suggest to enable ExtendedResourceToleration in gpu.md?

KashifSaadat · 2018-06-12T08:43:29Z

Great work, thanks for the contribution @dcwangmit01! The comments are also very helpful in understanding the flow.

I'd suggest maybe making some amendments to the related GPU documentation in regards to your findings / quirks, deprecation of alpha.kubernetes.io/nvidia-gpu from k8s v1.10, updating the example Pod spec also as the referenced tensorflow image no longer seems to be available.

Otherwise this LGTM 👍

bhack · 2018-06-20T09:57:17Z

https://news.developer.nvidia.com/kubernetes-on-nvidia-gpus-release-candidate-now-available/

bhack · 2018-06-21T20:35:57Z

GoogleCloudPlatform/container-engine-accelerators#51 (comment)

bhack · 2018-06-28T19:16:09Z

Seems that the repository matrix already support debian distributions. What do you think?

justinsb · 2018-07-20T04:32:36Z

This LGTM, but...

I'm a little confused by @bhack 's comments. Should we merge this @dcwangmit01 , or should we switch to use the container-engine-accelerators? Or both :-) ?

dcwangmit01 · 2018-07-20T04:45:48Z

@justinsb There's nothing in this PR that precludes or is incompatible with device plugins which is required for kubernetes >= 1.11.0, which kops does not officially support yet. What's in this PR is better than what currently exists.

I'd like to see it merged, of course. It's not a big deal. I'm working on the device plugin version as we speak, and it uses the same code. I'll have a PR in coming weeks. The question is: do we want to help people that are still using accelerators until they are forced to upgrade to device plugins in 1.11. I'd say yes.

faheem-nadeem · 2018-07-20T06:13:19Z

I agree with the above comments to merge the PR. I have been using this PR for some time to setup gpu instances, in our k8s 1.9 cluster for ML workloads. Everything checks out nicely :)

bhack · 2018-07-20T09:50:19Z

I think if it is ready it is better then the current kops gpu status. Then we can use Nividia container-engine-accelerators with another PR like it is trying to introduce Kubespray at kubernetes-sigs/kubespray#2913

justinsb · 2018-07-20T13:49:34Z

Seems there is consensus to merge :-) Thanks for clarifying, and thank you for the PR @dcwangmit01

/approve
/lgtm

k8s-ci-robot · 2018-07-20T13:49:41Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dcwangmit01, justinsb, mikesplain

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [justinsb]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

bhack · 2018-07-20T14:12:47Z

Can we just add a reference to ExtendedResourceToleration as discussed in this thread?

mikesplain · 2018-07-20T15:18:10Z

vpc limits

/retest

justinsb · 2018-07-20T21:49:54Z

I just cleared out the cruft that was causing us to hit the VPC limits, so this should be getting better...

/retest

justinsb · 2018-07-20T22:16:01Z

/retest

dcwangmit01 · 2018-07-20T22:26:02Z

Thanks for monitoring the tests @justinsb and @mikesplain. I've been watching as well.

What's the process for updating the docker image in the kopeio repository? Is it human, or the build system? I haven't updated the readme because the image is sitting in my public dockerhub. It could be re-tagged and pushed, with the doc subsequently updated.

@bhack The usage of ExtendedResourceToleration is an optimization/advanced usage that isn't needed to get GPUs working. I'll leave it be. Feel free to follow up with a PR.

erez-rabih · 2018-07-23T11:22:41Z

Hi
I tried taking the script and running it on a k8s 1.7.5 cluster using kops 1.7.1
The installation failed because of a mismatch between gcc versions of the kernel and the one used for nvidia drivers compilation
This is the syslog
syslog.txt
This is the nvidia-installer log
nvidia-installer.txt

Any idea how to fix that?

dcwangmit01 · 2018-07-23T18:43:34Z

Hi @erez-rabih,

You must be using a kops OS image where the kernel is built with the same gcc version as the distribution wants to install. That's how things should be. However, the default kops images that I've seen in the stable channel manifest all have their kernels built with GCC 7.3.0, whilst the default OS installation packages for gcc are GCC 4.9.2. The installation scripts assume a default kops image with kernel compiled with GCC 7.3.0, as specified in the stable channel manifest, and thus the gcc-7 is force-installed and then Nvidia drivers are forced to use gcc-7.

This hook will not work anywhere where the kernel has not been compiled with GCC 7.3.0. Perhaps you are using an older OS image build. This will not work on debian stretch images as well, where the kernel and gcc both are gcc-6. Try upgrading to one of the current stable images in the stable channel manifest. Do a kops edit cluster, set the image, and then rolling update.

This morning I spun up a few different kops images from the stable channel manifest to check kernel and gcc versions. I've pasted the output below. You should choose from one of the images.

-dave

# Jessie image where kernel is compiled with different version than GCC 7.3.0 != 4.9.2
image: k8s-1.7-debian-jessie-amd64-hvm-ebs-2018-03-11
$ cat /proc/version
Linux version 4.4.121-k8s (root@65861083f005) (gcc version 7.3.0 (Debian 7.3.0-10) ) #1 SMP Sun Mar 11 19:39:47 UTC 2018
$ gcc --version
gcc (Debian 4.9.2-10+deb8u1) 4.9.2

# Jessie image where kernel is compiled with different version than GCC 7.3.0 != 4.9.2
image: k8s-1.9-debian-jessie-amd64-hvm-ebs-2018-03-11
$ cat /proc/version
Linux version 4.4.121-k8s (root@65861083f005) (gcc version 7.3.0 (Debian 7.3.0-10) ) #1 SMP Sun Mar 11 19:39:47 UTC 2018
$ gcc --version
gcc (Debian 4.9.2-10+deb8u1) 4.9.2

# Jessie image where kernel is compiled with different version than GCC 7.3.0 != 4.9.2
image: k8s-1.8-debian-jessie-amd64-hvm-ebs-2018-03-11
$ cat /proc/version
Linux version 4.4.121-k8s (root@65861083f005) (gcc version 7.3.0 (Debian 7.3.0-10) ) #1 SMP Sun Mar 11 19:39:47 UTC 2018
$ gcc --version
gcc (Debian 4.9.2-10+deb8u1) 4.9.2

# STRETCH image where kernel is compiled with same version than GCC 6.3.0 == 6.30
image: k8s-1.10-debian-stretch-amd64-hvm-ebs-2018-05-27
$ cat /proc/version
Linux version 4.9.0-6-amd64 ([email protected]) (gcc version 6.3.0 20170516 (Debian 6.3.0-18+deb9u1) ) #1 SMP Debian 4.9.88-1+deb9u1 (2018-05-07)
$ gcc --version
gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516

erez-rabih · 2018-07-24T07:05:51Z

@dcwangmit01 Thanks for your detailed answer, unfortunately it didn't help
I changed the image to kope.io/k8s-1.7-debian-jessie-amd64-hvm-ebs-2018-03-11 as suggested

$ cat /proc/version
Linux version 4.4.121-k8s (root@65861083f005) (gcc version 7.3.0 (Debian 7.3.0-10) ) #1 SMP Sun Mar 11 19:39:47 UTC 2018
$ gcc --version
gcc (Debian 4.9.2-10+deb8u1) 4.9.2

and the hook is set to run the image dcwangmit01/aws-nvidia-bootstrap:0.1.1
The instance type is p2.xlarge on us-west-2
Still I'm getting this error:

Verifying sha1sum of file at /rootfs/nvidia-bootstrap-cache/NVIDIA-Linux-x86_64-390.46.run
/rootfs/nvidia-bootstrap-cache/NVIDIA-Linux-x86_64-390.46.run: OK
Installing file NVIDIA-Linux-x86_64-390.46.run on host
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 390.46........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most
       frequently when this kernel module was built against the wrong or
       improperly configured kernel sources, with a version of gcc that
       differs from the one used to build the target kernel, or if a driver
       such as rivafb, nvidiafb, or nouveau is present and prevents the
       NVIDIA kernel module from obtaining ownership of the NVIDIA graphics
       device(s), or no NVIDIA GPU installed in this system is supported by
       this NVIDIA Linux graphics driver release.
       
       Please see the log entries 'Kernel module load error' and 'Kernel
       messages' at the end of the file '/var/log/nvidia-installer.log' for
       more information.


ERROR: Installation has failed.  Please see the file
       '/var/log/nvidia-installer.log' for details.  You may find
       suggestions on fixing installation problems in the README available
       on the Linux driver download page at www.nvidia.com.

and on nvidia-installer log:

    LD [M]  /tmp/selfgz22/NVIDIA-Linux-x86_64-390.46/kernel/nvidia.ko
   make[1]: Leaving directory '/usr/src/linux-headers-4.4.121-k8s'
-> done.
-> Kernel module compilation complete.
-> Unable to determine if Secure Boot is enabled: No such file or directory
ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if a driver such as rivafb, nvidiafb, or nouveau is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA graphics device(s), or no NVIDIA GPU installed in this system is supported by this NVIDIA Linux graphics driver release.

Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.
-> Kernel module load error: Exec format error
-> Kernel messages:
[   24.253863] IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready
[  159.891274] [drm] Module unloaded
[  160.029583] wmi: Mapper unloaded
[  208.822809] ipmi message handler version 39.2
[  208.832189] nvidia: loading out-of-tree module taints kernel.
[  208.832196] nvidia: module license 'NVIDIA' taints kernel.
[  208.832198] Disabling lock debugging due to kernel taint
[  208.838213] module: nvidia: Unknown rela relocation: 4
[  249.828118] Netfilter messages via NETLINK v0.30.
[  249.835848] ctnetlink v0.93: registering with nfnetlink.
[  264.576925] ipmi message handler version 39.2
[  264.592577] module: nvidia: Unknown rela relocation: 4
[  268.951956] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[  268.978813] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[  268.984780] device vethdf46a165 entered promiscuous mode
[  268.988535] cni0: port 1(vethdf46a165) entered forwarding state
[  268.992624] cni0: port 1(vethdf46a165) entered forwarding state
[  269.457955] IPv6: eth0: IPv6 duplicate address fe80::580f:e1ff:febe:c8e7 detected!
[  284.001947] cni0: port 1(vethdf46a165) entered forwarding state
[  316.677382] ipmi message handler version 39.2
[  316.692481] module: nvidia: Unknown rela relocation: 4
[  375.059282] ipmi message handler version 39.2
[  375.074478] module: nvidia: Unknown rela relocation: 4
[  433.346601] ipmi message handler version 39.2
[  433.361618] module: nvidia: Unknown rela relocation: 4
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

dcwangmit01 · 2018-07-24T07:25:03Z

@erez-rabih Try the 1.10 jessie image. That's the one I've used.

Also try the new PR in legacy mode here: #5502

erez-rabih · 2018-07-24T07:29:03Z

@dcwangmit01 there is no jessie 1.10
The latest jessie image on the latest channel is kope.io/k8s-1.9-debian-jessie-amd64-hvm-ebs-2018-03-11

dcwangmit01 · 2018-07-24T15:36:13Z

@erez-rabih Let's move this conversation into an issue, if it needs to continue. Here's how to find images.

$ aws ec2 describe-images --owners 383156758163| grep ImageLocation|grep 1.10-debian-jessie|grep 5-27
            "ImageLocation": "383156758163/k8s-1.10-debian-jessie-amd64-hvm-ebs-2018-05-27",

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 11, 2018

dcwangmit01 changed the title ~~Updated Kops GPU Setup Hook~~ [GPU] Updated Kops GPU Setup Hook Apr 11, 2018

k8s-ci-robot assigned KashifSaadat Apr 11, 2018

dcwangmit01 mentioned this pull request Apr 11, 2018

GPU Support on AWS #518

Closed

rtluckie reviewed Apr 11, 2018

View reviewed changes

bhack reviewed Apr 12, 2018

View reviewed changes

Fixed missing early exit upon unhandled file

ef958a7

This was referenced Apr 30, 2018

Include GPU daemonset in GKE configs? kubeflow/kubeflow#288

Closed

Can we preinstall / prebuild CUDA drivers? #1726

Closed

bhack mentioned this pull request May 10, 2018

Provide a pathway for installing Device Support on kops #4591

Closed

115100 reviewed May 11, 2018

View reviewed changes

chrislovecnm reviewed May 12, 2018

View reviewed changes

k8s-ci-robot assigned mikesplain and rdrgmnzs May 12, 2018

k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label May 12, 2018

chrislovecnm unassigned KashifSaadat May 12, 2018

mikesplain approved these changes May 15, 2018

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 15, 2018

justinsb added this to the 1.10 milestone Jun 2, 2018

k8s-ci-robot assigned justinsb Jul 20, 2018

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 20, 2018

This comment has been minimized.

Sign in to view

k8s-ci-robot merged commit 19b81f0 into kubernetes:master Jul 21, 2018

dcwangmit01 mentioned this pull request Jul 23, 2018

Implemented Nvidia DevicePlugin GPU Support on AWS #5502

Merged

[GPU] Updated Kops GPU Setup Hook #4971

[GPU] Updated Kops GPU Setup Hook #4971

Conversation

dcwangmit01 commented Apr 11, 2018 • edited Loading

dcwangmit01 commented Apr 11, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bhack Apr 12, 2018 • edited Loading

Choose a reason for hiding this comment

bhack commented Apr 26, 2018

RenaudWasTaken commented Apr 26, 2018

rrtaylor commented May 10, 2018

dcwangmit01 commented May 10, 2018

rrtaylor commented May 10, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rohitagarwal003 commented May 11, 2018

chrislovecnm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chrislovecnm commented May 12, 2018

mikesplain left a comment

Choose a reason for hiding this comment

bhack commented May 16, 2018

KashifSaadat commented Jun 12, 2018

bhack commented Jun 20, 2018

bhack commented Jun 21, 2018

bhack commented Jun 28, 2018

justinsb commented Jul 20, 2018

dcwangmit01 commented Jul 20, 2018 • edited Loading

faheem-nadeem commented Jul 20, 2018

bhack commented Jul 20, 2018

justinsb commented Jul 20, 2018

k8s-ci-robot commented Jul 20, 2018

bhack commented Jul 20, 2018

This comment has been minimized.

mikesplain commented Jul 20, 2018

justinsb commented Jul 20, 2018

justinsb commented Jul 20, 2018

dcwangmit01 commented Jul 20, 2018

erez-rabih commented Jul 23, 2018 • edited Loading

dcwangmit01 commented Jul 23, 2018 • edited Loading

erez-rabih commented Jul 24, 2018

dcwangmit01 commented Jul 24, 2018

erez-rabih commented Jul 24, 2018

dcwangmit01 commented Jul 24, 2018

dcwangmit01 commented Apr 11, 2018 •

edited

Loading

bhack Apr 12, 2018 •

edited

Loading

dcwangmit01 commented Jul 20, 2018 •

edited

Loading

erez-rabih commented Jul 23, 2018 •

edited

Loading

dcwangmit01 commented Jul 23, 2018 •

edited

Loading