-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
datastore: block flatfs has too many buckets. #3062
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Related: Repo GC doesn't clean up empty directories in the block database. |
Adding an anecdote here, I've run into a "Too many links" issue on a FreeBSD system before when adding a lot of files to ipfs on UFS. For reference it was an old 32bit testing machine running I think fbsd 9 and ipfs 3.x at the time. It couldn't handle all the directories under blocks/. |
Which version are you running? There is difference in total prefix length and amount of entropy in that prefix, due to multihash prefix. In go-ipfs 0.4.3-X there are 512 buckets in the flatfs. Which is only bit more than git uses. This is also different from what was used in 0.4.2 and before. |
@Kubuxu I'm using go-ipfs version 0.4.2 ( |
Yeah I warned about this before. I think the verdict was "any filesystem We can implement a migration to make the fanout smaller and deeper. Though
|
@jbenet it might not be an issue after all, we already migrated from 16bit fanout to 5bit limiting number of buckets from 65k to 512. |
@jbenet Actually, in 0.4.2, this was a huge problem, with all the buckets causing massive slowdowns on the order of 8-10x slower than an empty repo. It's way less of a problem in 0.4.3, but it still doesn't scale to terabytes in the repo. |
Yes for that we would need to implement variable fanout depending on number of blocks stored in previous level. It will be complex as you need to both take into account expanding and contracting the fanout. |
We could just expand for now. implement compaction when it is needed |
but yes, we really need it to be variable sized. guess who implemented this years ago. too bad it's the wrong language :) and not auto-expanding. |
Version/Platform/Processor information: 0.4.2/Windows 10 x64
Type: bug/enchancement
Area: blockstore
Priority: P0/P1
Description:
The current flatdb format for blocks has way too many buckets. Most filesystems slow down drastically as the number of objects in a directory increases.
Git uses 2 hexadecimal characters for buckets. I believe 2 or 3 is the most a filesystem like Windows's NTFS can handle without drastic slowdown.
Alternatively, the blockdb could use several directories, as in blocks/1a/2b/3c4d5e...9f.data
When adding a lot of objects, for example a 20 gigabyte file, in the current system of 8 characters per bucket, there is an average of 2 .data files per bucket, and the amount of buckets themselves slow down block operations by at least a factor of 8 (NTFS)
Note that I didn't mark the title with the issue type. That is because I'm not quite sure what type of issue this is; I consider it a suggestion.
The text was updated successfully, but these errors were encountered: