-
Notifications
You must be signed in to change notification settings - Fork 309
Bug: concurrency causes Premature destruction of containers and networks (e.g. w/pytest-xdist) #567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
What do you mean by
Give some time to prematurely stop
…On Thu, May 9, 2024, 2:57 PM Fabian Haenel ***@***.***> wrote:
*Describe the bug*
Testcontainers behaves incredibly flaky when running tests in parallel
with pytest-xdist. Specifically containers and networks are destroyed
prematurely resulting in network connections being interrupted, ports being
reassigned to different containers resulting in calls to the wrong
application or instance.
*To Reproduce*
Install the packages used for the tests.
pip install pytest==8.2 pytest-xdist==3.6 testcontainers==4.4 requests==2.31
Create a file named example.py
from time import sleep
import pytestimport requestsfrom testcontainers.core.network import Networkfrom testcontainers.postgres import PostgresContainerfrom testcontainers.vault import VaultContainer
@pytest.mark.parametrize("run", range(30))def test_healthcheck(run: int) -> None:
with Network() as network:
with PostgresContainer().with_network(network) as postgres_container:
with VaultContainer().with_network(network).with_exposed_ports(8200) as vault_container:
sleep(1) # Give some time to prematurely stop
response = requests.get(f"{vault_container.get_connection_url()}/v1/sys/health")
assert response.status_code == 200
Run pytest with parallel test execution on a multicore CPU (at least 2
cores).
pytest -n auto example.py
*Runtime environment*
Linux 6.8.0-76060800daily20240311-generic
Python 3.10 3.11 3.12
CPU i7-1165G7
testcontainers 4.4
—
Reply to this email directly, view it on GitHub
<#567>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACECGJEDDYQAU64UDVEMKRLZBPBJNAVCNFSM6AAAAABHPKJ4OOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGI4DQMRXGI3DQNY>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
This is a minimal example to showcase the buggy behavior, in my actual tests the sleep is other operations that just take some time, in which case the container sometimes have been stopped and disposed already, when they should not have been (the context has not been left at that point). Additional information: |
Sometimes I see errors as follows, which are caused by the vault container in my example being stopped mid-test execution. It did not crash, it was stopped and removed before it should have been. The issue is hard to reproduce, due to the issues flaky nature, sometimes no tests fail, sometimes only one, sometimes many.
|
So I'm guessing something in the Ryuk class or something else is failing to start the containers under parallelism. I guess we could start surrounding parts of the code with |
I'm going to go a bit off road here, @skeletorXVI are you familiar with Note Taken from https://pytest-xdist.readthedocs.io/en/latest/distribution.html
@alexanderankin I know this is not a proper fix, just trying to have some sort of workaround. |
@skeletorXVI in your recollection, is a 30 second delay relevant here? As in, could the containers be cleaned up after a 30 second delay and your tests take > 30 seconds to run? This would be explained by ryuk loosing connection to the application, which it then times out after a fixed delay (which iirc is 30 seconds). |
The test suite in it's entirety take longer than 30 seconds but there are no individual tests that get even close to 30 seconds. @Tranquility2 |
I'm seeing this very same problem with Increasing concurrency seems to exacerbate the problem... |
Looking around it seems like folks have had the same issue in other languages. Assuming all languages use the same underlying container then there seems to be an issue with the code that waits for the container to start up...here is the fix they made for rust. I'm going to monkeypatch our code and see if this resolves things locally. |
OK - I can see how the python implementation would cause some indeterminate errors. The fix was to wait for the log message on both stderr AND stdout, one for each boot cycle - the problem with the python code is that it has an OR condition that only requires presence of the line in either log. Monkey patching this isn't the cleanest but I can knock something up to prove the fix - a proper fix would require a more involved patch to |
I worked around the issue with the following patch in our pytest
|
i can change it to an and rather than an or if you want, i observe it does get printed twice. i am trying to figure out how to observe whether it is stdout or stderr. but it does seem very plausible |
The trouble is that method is used in other places in the codebase so its probably better to parametize it somehow...thats the reason I didn't attempt a more comprehensive patch. But you folks know this library better than me so whatever you think works - it fixes the issue anyway. |
How about switching to a v2 with |
oh i see the issue is with the wait_for_logs function not in the postgres module itself. yeah thats a bit tricky. i suspect that whole API will change with testcontainers v5 to look more like the oop ones (e.g. rust copies the "wait for" api). |
#661 will make it so you can do and with current api |
From my testing the following PR solved this bug: #678 |
Describe the bug
Testcontainers behaves incredibly flaky when running tests in parallel with
pytest-xdist
. Specifically containers and networks are destroyed prematurely resulting in network connections being interrupted, ports being reassigned to different containers resulting in calls to the wrong application or instance.To Reproduce
Note due to the flakiness you might need to tune to number of runs or repeat the test execution to see the errors.
Install the packages used for the tests.
Create a file named
example.py
Run pytest with parallel test execution on a multicore CPU (at least 2 cores).
Runtime environment
Linux
6.8.0-76060800daily20240311-generic
Python
3.10
3.11
3.12
CPU
i7-1165G7
testcontainers
4.4
The text was updated successfully, but these errors were encountered: