Skip to content

Commit fad780e

Browse files
authored
Document valid config options for ECS / Fargate (#156)
* [docs] document valid config options for ECS / Fargate * add link to troubleshooting guide * linting
1 parent 80965b4 commit fad780e

File tree

2 files changed

+37
-1
lines changed

2 files changed

+37
-1
lines changed

dask_cloudprovider/aws/ecs.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -453,10 +453,12 @@ class ECSCluster(SpecCluster):
453453
The amount of CPU to request for the scheduler in milli-cpu (1/1024).
454454
455455
Defaults to ``1024`` (one vCPU).
456+
See the `troubleshooting guide`_ for information on the valid values for this argument.
456457
scheduler_mem: int (optional)
457458
The amount of memory to request for the scheduler in MB.
458459
459460
Defaults to ``4096`` (4GB).
461+
See the `troubleshooting guide`_ for information on the valid values for this argument.
460462
scheduler_timeout: str (optional)
461463
The scheduler task will exit after this amount of time if there are no clients connected.
462464
@@ -471,10 +473,12 @@ class ECSCluster(SpecCluster):
471473
The amount of CPU to request for worker tasks in milli-cpu (1/1024).
472474
473475
Defaults to ``4096`` (four vCPUs).
476+
See the `troubleshooting guide`_ for information on the valid values for this argument.
474477
worker_mem: int (optional)
475478
The amount of memory to request for worker tasks in MB.
476479
477480
Defaults to ``16384`` (16GB).
481+
See the `troubleshooting guide`_ for information on the valid values for this argument.
478482
worker_gpu: int (optional)
479483
The number of GPUs to expose to the worker.
480484
@@ -636,6 +640,7 @@ class ECSCluster(SpecCluster):
636640
you must ensure the NVIDIA CUDA toolkit is installed with a version that matches the host machine
637641
along with ``dask-cuda``.
638642
643+
.. _troubleshooting guide: ./troubleshooting.html#invalid-cpu-or-memory
639644
"""
640645

641646
def __init__(

doc/source/troubleshooting.rst

Lines changed: 32 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,4 +15,35 @@ If you are unable to connect to this address it is likely that there is somethin
1515
for example you may have corporate policies implementing additional firewall rules on your account.
1616

1717
To reduce the chances of this happening it is often simplest to run Dask Cloudprovider from within the cloud you are trying
18-
to use and configure private networking only. See your specific cluster manager docs for info.
18+
to use and configure private networking only. See your specific cluster manager docs for info.
19+
20+
Invalid CPU or Memory
21+
---------------------
22+
23+
When working with ``FargateCluster`` or ``ECSCluster``, CPU and memory arguments can only take values from a fixed set of combinations.
24+
25+
So, for example, code like this will result in an error
26+
27+
.. code-block:: python
28+
29+
from dask_cloudprovider import FargateCluster
30+
cluster = FargateCluster(
31+
image="daskdev/dask:latest",
32+
worker_cpu=256,
33+
worker_mem=30720,
34+
n_workers=2,
35+
fargate_use_private_ip=False,
36+
scheduler_timeout="15 minutes"
37+
)
38+
client = Client(cluster)
39+
cluster
40+
41+
# botocore.errorfactory.ClientException:
42+
# An error occurred (ClientException) when calling the RegisterTaskDefinition operation:
43+
# No Fargate configuration exists for given values.
44+
45+
46+
This is because ECS and Fargate task definitions with ``CPU=256`` cannot have as much memory as that code is requesting.
47+
48+
The AWS-accepted set of combinations is documented at
49+
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-cpu-memory-error.html.

0 commit comments

Comments
 (0)