Closed
Description
Expected Behavior
File descriptor usage should be fairly constant while no syncs are running. During syncs, the usage should increase, and after it should decrease to what it was before.
Current Behavior
One of our users reported in Slack that his syncs were failing being scheduled. He attached logs which contained:
�[32mairbyte-scheduler |�[0m java.lang.RuntimeException: /data/config/STANDARD_SYNC: Too many open files
�[32mairbyte-scheduler |�[0m at io.airbyte.scheduler.JobScheduler.getAllActiveConnections(JobScheduler.java:120) ~[io.airbyte-airbyte-scheduler-0.9.1-alpha.jar:?]
�[32mairbyte-scheduler |�[0m at io.airbyte.scheduler.JobScheduler.scheduleSyncJobs(JobScheduler.java:92) ~[io.airbyte-airbyte-scheduler-0.9.1-alpha.jar:?]
�[32mairbyte-scheduler |�[0m at io.airbyte.scheduler.JobScheduler.run(JobScheduler.java:82) [io.airbyte-airbyte-scheduler-0.9.1-alpha.jar:?]
�[32mairbyte-scheduler |�[0m at io.airbyte.scheduler.SchedulerApp.lambda$start$0(SchedulerApp.java:95) [io.airbyte-airbyte-scheduler-0.9.1-alpha.jar:?]
�[32mairbyte-scheduler |�[0m at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
�[32mairbyte-scheduler |�[0m at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) [?:?]
�[32mairbyte-scheduler |�[0m at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) [?:?]
�[32mairbyte-scheduler |�[0m at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
�[32mairbyte-scheduler |�[0m at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
�[32mairbyte-scheduler |�[0m at java.lang.Thread.run(Thread.java:832) [?:?]
�[32mairbyte-scheduler |�[0m Caused by: java.nio.file.FileSystemException: /data/config/STANDARD_SYNC: Too many open files
�[32mairbyte-scheduler |�[0m at sun.nio.fs.UnixException.translateToIOException(UnixException.java:100) ~[?:?]
�[32mairbyte-scheduler |�[0m at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
�[32mairbyte-scheduler |�[0m at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) ~[?:?]
�[32mairbyte-scheduler |�[0m at sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:432) ~[?:?]
�[32mairbyte-scheduler |�[0m at java.nio.file.Files.newDirectoryStream(Files.java:476) ~[?:?]
�[32mairbyte-scheduler |�[0m at java.nio.file.Files.list(Files.java:3765) ~[?:?]
�[32mairbyte-scheduler |�[0m at io.airbyte.config.persistence.DefaultConfigPersistence.listConfigsInternal(DefaultConfigPersistence.java:112) ~[io.airbyte.airbyte-config-persistence-0.9.1-alpha.jar:?]
�[32mairbyte-scheduler |�[0m at io.airbyte.config.persistence.DefaultConfigPersistence.listConfigs(DefaultConfigPersistence.java:74) ~[io.airbyte.airbyte-config-persistence-0.9.1-alpha.jar:?]
Steps to Reproduce
I was able to reproduce some form of the problem locally on my Mac laptop.
- From a fresh Docker environment
docker-compose up
- Then I installed lsof and monitored its output every few seconds below:
# jrhizor in ~/code/airbyte on git:master ● [10:16:03]
→ docker-compose exec scheduler /bin/bash -c "apt-get install lsof && lsof | wc -l"
Reading package lists... Done
Building dependency tree
Reading state information... Done
Suggested packages:
perl
The following NEW packages will be installed:
lsof
0 upgraded, 1 newly installed, 0 to remove and 7 not upgraded.
Need to get 319 kB of archives.
After this operation, 465 kB of additional disk space will be used.
Get:1 http://deb.debian.org/debian buster/main amd64 lsof amd64 4.91+dfsg-1 [319 kB]
Fetched 319 kB in 0s (2989 kB/s)
debconf: delaying package configuration, since apt-utils is not installed
Selecting previously unselected package lsof.
(Reading database ... 11610 files and directories currently installed.)
Preparing to unpack .../lsof_4.91+dfsg-1_amd64.deb ...
Unpacking lsof (4.91+dfsg-1) ...
Setting up lsof (4.91+dfsg-1) ...
3331
# jrhizor in ~/code/airbyte on git:master ● [10:16:25]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l" <<<< It stays constant at 3k if no jobs have been scheduled.
3331
# jrhizor in ~/code/airbyte on git:master ● [10:16:30]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
3331
# jrhizor in ~/code/airbyte on git:master ● [10:16:32]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
3331
# jrhizor in ~/code/airbyte on git:master ● [10:16:33]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
3331
# jrhizor in ~/code/airbyte on git:master ● [10:16:35]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
3331
# jrhizor in ~/code/airbyte on git:master ● [10:16:50]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l" <<<< this is after some check connection/setup stuff
4806
# jrhizor in ~/code/airbyte on git:master ● [10:17:33]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
4806
# jrhizor in ~/code/airbyte on git:master ● [10:17:35]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
4806
# jrhizor in ~/code/airbyte on git:master ● [10:17:36]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5150
# jrhizor in ~/code/airbyte on git:master ● [10:17:59]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5013
# jrhizor in ~/code/airbyte on git:master ● [10:18:00]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l" <<<< this spike was a exchangerate to csv sync
6159
# jrhizor in ~/code/airbyte on git:master ● [10:18:02]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
6159
# jrhizor in ~/code/airbyte on git:master ● [10:18:03]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
6159
# jrhizor in ~/code/airbyte on git:master ● [10:18:05]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
6237
# jrhizor in ~/code/airbyte on git:master ● [10:18:06]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5193
# jrhizor in ~/code/airbyte on git:master ● [10:18:09]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5193
# jrhizor in ~/code/airbyte on git:master ● [10:18:10]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5265
# jrhizor in ~/code/airbyte on git:master ● [10:18:12]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5265
# jrhizor in ~/code/airbyte on git:master ● [10:18:14]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5337
# jrhizor in ~/code/airbyte on git:master ● [10:18:19]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5388
# jrhizor in ~/code/airbyte on git:master ● [10:18:21]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5409
# jrhizor in ~/code/airbyte on git:master ● [10:18:23]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5409
# jrhizor in ~/code/airbyte on git:master ● [10:18:24]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5479
# jrhizor in ~/code/airbyte on git:master ● [10:18:26]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5481
# jrhizor in ~/code/airbyte on git:master ● [10:18:27]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5481
# jrhizor in ~/code/airbyte on git:master ● [10:18:29]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5481
# jrhizor in ~/code/airbyte on git:master ● [10:18:30]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5553
# jrhizor in ~/code/airbyte on git:master ● [10:18:32]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5553
# jrhizor in ~/code/airbyte on git:master ● [10:18:33]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5625
# jrhizor in ~/code/airbyte on git:master ● [10:18:36]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5625
# jrhizor in ~/code/airbyte on git:master ● [10:18:48]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
6057
# jrhizor in ~/code/airbyte on git:master ● [10:19:06]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5725
# jrhizor in ~/code/airbyte on git:master ● [10:19:08]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5692
# jrhizor in ~/code/airbyte on git:master ● [10:19:50]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5878
# jrhizor in ~/code/airbyte on git:master ● [10:20:05]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5940
# jrhizor in ~/code/airbyte on git:master ● [10:20:07]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5940
# jrhizor in ~/code/airbyte on git:master ● [10:20:08]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5940
# jrhizor in ~/code/airbyte on git:master ● [10:22:28]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l" <<<< a few minutes later
7676
# jrhizor in ~/code/airbyte on git:master ● [10:22:30]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l" <<<< a few minutes later
11365
# jrhizor in ~/code/airbyte on git:master ● [10:27:26]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l" <<<< a few minutes later
14807
If you do this for the server
, it doesn't just keep growing. It grows when it's accessing config files and then decreases back to the steady state.
On the scheduler
, a lot of the open descriptors look like:
java 1 75 worker-1 root 150r DIR 254,1 4096 2110906 /data/config/STANDARD_SYNC
Severity of the bug for you
High