Skip to content

file descriptor limit reached  #1470

Closed
@jrhizor

Description

@jrhizor

Expected Behavior

File descriptor usage should be fairly constant while no syncs are running. During syncs, the usage should increase, and after it should decrease to what it was before.

Current Behavior

One of our users reported in Slack that his syncs were failing being scheduled. He attached logs which contained:

�[32mairbyte-scheduler |�[0m java.lang.RuntimeException: /data/config/STANDARD_SYNC: Too many open files
�[32mairbyte-scheduler |�[0m 	at io.airbyte.scheduler.JobScheduler.getAllActiveConnections(JobScheduler.java:120) ~[io.airbyte-airbyte-scheduler-0.9.1-alpha.jar:?]
�[32mairbyte-scheduler |�[0m 	at io.airbyte.scheduler.JobScheduler.scheduleSyncJobs(JobScheduler.java:92) ~[io.airbyte-airbyte-scheduler-0.9.1-alpha.jar:?]
�[32mairbyte-scheduler |�[0m 	at io.airbyte.scheduler.JobScheduler.run(JobScheduler.java:82) [io.airbyte-airbyte-scheduler-0.9.1-alpha.jar:?]
�[32mairbyte-scheduler |�[0m 	at io.airbyte.scheduler.SchedulerApp.lambda$start$0(SchedulerApp.java:95) [io.airbyte-airbyte-scheduler-0.9.1-alpha.jar:?]
�[32mairbyte-scheduler |�[0m 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
�[32mairbyte-scheduler |�[0m 	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) [?:?]
�[32mairbyte-scheduler |�[0m 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) [?:?]
�[32mairbyte-scheduler |�[0m 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
�[32mairbyte-scheduler |�[0m 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
�[32mairbyte-scheduler |�[0m 	at java.lang.Thread.run(Thread.java:832) [?:?]
�[32mairbyte-scheduler |�[0m Caused by: java.nio.file.FileSystemException: /data/config/STANDARD_SYNC: Too many open files
�[32mairbyte-scheduler |�[0m 	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:100) ~[?:?]
�[32mairbyte-scheduler |�[0m 	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111) ~[?:?]
�[32mairbyte-scheduler |�[0m 	at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116) ~[?:?]
�[32mairbyte-scheduler |�[0m 	at sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:432) ~[?:?]
�[32mairbyte-scheduler |�[0m 	at java.nio.file.Files.newDirectoryStream(Files.java:476) ~[?:?]
�[32mairbyte-scheduler |�[0m 	at java.nio.file.Files.list(Files.java:3765) ~[?:?]
�[32mairbyte-scheduler |�[0m 	at io.airbyte.config.persistence.DefaultConfigPersistence.listConfigsInternal(DefaultConfigPersistence.java:112) ~[io.airbyte.airbyte-config-persistence-0.9.1-alpha.jar:?]
�[32mairbyte-scheduler |�[0m 	at io.airbyte.config.persistence.DefaultConfigPersistence.listConfigs(DefaultConfigPersistence.java:74) ~[io.airbyte.airbyte-config-persistence-0.9.1-alpha.jar:?]

Steps to Reproduce

I was able to reproduce some form of the problem locally on my Mac laptop.

  1. From a fresh Docker environment docker-compose up
  2. Then I installed lsof and monitored its output every few seconds below:
# jrhizor in ~/code/airbyte on git:master ● [10:16:03]
→ docker-compose exec scheduler /bin/bash -c "apt-get install lsof && lsof | wc -l"
Reading package lists... Done
Building dependency tree
Reading state information... Done
Suggested packages:
  perl
The following NEW packages will be installed:
  lsof
0 upgraded, 1 newly installed, 0 to remove and 7 not upgraded.
Need to get 319 kB of archives.
After this operation, 465 kB of additional disk space will be used.
Get:1 http://deb.debian.org/debian buster/main amd64 lsof amd64 4.91+dfsg-1 [319 kB]
Fetched 319 kB in 0s (2989 kB/s)
debconf: delaying package configuration, since apt-utils is not installed
Selecting previously unselected package lsof.
(Reading database ... 11610 files and directories currently installed.)
Preparing to unpack .../lsof_4.91+dfsg-1_amd64.deb ...
Unpacking lsof (4.91+dfsg-1) ...
Setting up lsof (4.91+dfsg-1) ...
3331

# jrhizor in ~/code/airbyte on git:master ● [10:16:25]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l" <<<< It stays constant at 3k if no jobs have been scheduled.
3331
# jrhizor in ~/code/airbyte on git:master ● [10:16:30]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
3331
# jrhizor in ~/code/airbyte on git:master ● [10:16:32]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
3331
# jrhizor in ~/code/airbyte on git:master ● [10:16:33]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
3331
# jrhizor in ~/code/airbyte on git:master ● [10:16:35]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
3331
# jrhizor in ~/code/airbyte on git:master ● [10:16:50]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l" <<<< this is after some check connection/setup stuff
4806
# jrhizor in ~/code/airbyte on git:master ● [10:17:33]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
4806
# jrhizor in ~/code/airbyte on git:master ● [10:17:35]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
4806
# jrhizor in ~/code/airbyte on git:master ● [10:17:36]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5150
# jrhizor in ~/code/airbyte on git:master ● [10:17:59]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5013
# jrhizor in ~/code/airbyte on git:master ● [10:18:00]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l" <<<< this spike was a exchangerate to csv sync
6159
# jrhizor in ~/code/airbyte on git:master ● [10:18:02]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
6159
# jrhizor in ~/code/airbyte on git:master ● [10:18:03]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
6159
# jrhizor in ~/code/airbyte on git:master ● [10:18:05]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
6237
# jrhizor in ~/code/airbyte on git:master ● [10:18:06]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5193
# jrhizor in ~/code/airbyte on git:master ● [10:18:09]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5193
# jrhizor in ~/code/airbyte on git:master ● [10:18:10]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5265
# jrhizor in ~/code/airbyte on git:master ● [10:18:12]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5265
# jrhizor in ~/code/airbyte on git:master ● [10:18:14]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5337
# jrhizor in ~/code/airbyte on git:master ● [10:18:19]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5388
# jrhizor in ~/code/airbyte on git:master ● [10:18:21]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5409
# jrhizor in ~/code/airbyte on git:master ● [10:18:23]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5409
# jrhizor in ~/code/airbyte on git:master ● [10:18:24]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5479
# jrhizor in ~/code/airbyte on git:master ● [10:18:26]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5481
# jrhizor in ~/code/airbyte on git:master ● [10:18:27]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5481
# jrhizor in ~/code/airbyte on git:master ● [10:18:29]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5481
# jrhizor in ~/code/airbyte on git:master ● [10:18:30]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5553
# jrhizor in ~/code/airbyte on git:master ● [10:18:32]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5553
# jrhizor in ~/code/airbyte on git:master ● [10:18:33]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5625
# jrhizor in ~/code/airbyte on git:master ● [10:18:36]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5625
# jrhizor in ~/code/airbyte on git:master ● [10:18:48]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
6057
# jrhizor in ~/code/airbyte on git:master ● [10:19:06]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l" 
5725
# jrhizor in ~/code/airbyte on git:master ● [10:19:08]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5692
# jrhizor in ~/code/airbyte on git:master ● [10:19:50]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5878
# jrhizor in ~/code/airbyte on git:master ● [10:20:05]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5940
# jrhizor in ~/code/airbyte on git:master ● [10:20:07]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5940
# jrhizor in ~/code/airbyte on git:master ● [10:20:08]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l"
5940
# jrhizor in ~/code/airbyte on git:master ● [10:22:28]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l" <<<< a few minutes later
7676
# jrhizor in ~/code/airbyte on git:master ● [10:22:30]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l" <<<< a few minutes later
11365
# jrhizor in ~/code/airbyte on git:master ● [10:27:26]
→ docker-compose exec scheduler /bin/bash -c "lsof | wc -l" <<<< a few minutes later
14807

If you do this for the server, it doesn't just keep growing. It grows when it's accessing config files and then decreases back to the steady state.

On the scheduler, a lot of the open descriptors look like:

java      1  75 worker-1  root  150r   DIR              254,1      4096 2110906 /data/config/STANDARD_SYNC

Severity of the bug for you

High

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions