Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Failure on docker run for ubuntu_build_cuda #16950

Closed
@larroy

Description

@larroy

Description

There seems to be a problem with nvidia docker in EC2 when running in recent 18.04 environment.

time ci/build.py --no-cache --nvidiadocker --platform ubuntu_build_cuda /work/runtime_functions.sh build_ubuntu_gpu_cuda101_cudnn7


Traceback (most recent call last):
  File "/home/piotr/.local/lib/python3.6/site-packages/docker/api/client.py", line 261, in _raise_for_status
    response.raise_for_status()
  File "/home/piotr/.local/lib/python3.6/site-packages/requests/models.py", line 940, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http+docker://localhost/v1.35/containers/4fafc8918ee1a6c4897cfc5f107cd1aad43b1cfbc9ecc2e438b9d4681c3e9824/start

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "ci/build.py", line 454, in <module>
    sys.exit(main())
  File "ci/build.py", line 378, in main
    local_ccache_dir=args.ccache_dir, environment=environment)
  File "ci/build.py", line 248, in container_run
    environment=environment)
  File "/home/piotr/mxnet/ci/safe_docker_run.py", line 123, in run
    container = self._add_container(self._docker_client.containers.run(*args, **kwargs))
  File "/home/piotr/.local/lib/python3.6/site-packages/docker/models/containers.py", line 809, in run
    container.start()
  File "/home/piotr/.local/lib/python3.6/site-packages/docker/models/containers.py", line 400, in start
    return self.client.api.start(self.id, **kwargs)
  File "/home/piotr/.local/lib/python3.6/site-packages/docker/utils/decorators.py", line 19, in wrapped
    return f(self, resource_id, *args, **kwargs)
  File "/home/piotr/.local/lib/python3.6/site-packages/docker/api/container.py", line 1095, in start
    self._raise_for_status(res)
  File "/home/piotr/.local/lib/python3.6/site-packages/docker/api/client.py", line 263, in _raise_for_status
    raise create_api_error_from_http_exception(e)
  File "/home/piotr/.local/lib/python3.6/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception
    raise cls(e, response=response, explanation=explanation)
docker.errors.APIError: 500 Server Error: Internal Server Error ("OCI runtime create failed: container_linux.go:346: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/1e6555a578e7365e7439fbfa43d6c5e82fcee89bd5438b907c3c4b8c3ea08011/merged/usr/bin/nvidia-smi: file exists\\\\n\\\"\"": unknown")
Command exited with non-zero status 1
1.87user 0.84system 27:34.59elapsed 0%CPU (0avgtext+0avgdata 68112maxresident)k
0inputs+0outputs (0major+38834minor)pagefaults 0swaps
piotr@34-215-40-140:1: ~/mxnet [upstream_master]> time ci/build.py --no-cache --nvidiadocker --platform ubuntu_build_cuda /work/runtime_functions.sh build_ubuntu_gpu_cuda101_cudnn7

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions