Skip to content

datastore: block flatfs has too many buckets. #3062

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mateon1 opened this issue Aug 8, 2016 · 10 comments
Open

datastore: block flatfs has too many buckets. #3062

mateon1 opened this issue Aug 8, 2016 · 10 comments
Labels
status/deferred Conscious decision to pause or backlog topic/repo Topic repo

Comments

@mateon1
Copy link
Contributor

mateon1 commented Aug 8, 2016

Version/Platform/Processor information: 0.4.2/Windows 10 x64

Type: bug/enchancement
Area: blockstore
Priority: P0/P1

Description:
The current flatdb format for blocks has way too many buckets. Most filesystems slow down drastically as the number of objects in a directory increases.

Git uses 2 hexadecimal characters for buckets. I believe 2 or 3 is the most a filesystem like Windows's NTFS can handle without drastic slowdown.
Alternatively, the blockdb could use several directories, as in blocks/1a/2b/3c4d5e...9f.data

When adding a lot of objects, for example a 20 gigabyte file, in the current system of 8 characters per bucket, there is an average of 2 .data files per bucket, and the amount of buckets themselves slow down block operations by at least a factor of 8 (NTFS)


Note that I didn't mark the title with the issue type. That is because I'm not quite sure what type of issue this is; I consider it a suggestion.

@mateon1
Copy link
Contributor Author

mateon1 commented Aug 8, 2016

Related: Repo GC doesn't clean up empty directories in the block database.

@djdv
Copy link
Contributor

djdv commented Aug 9, 2016

Adding an anecdote here, I've run into a "Too many links" issue on a FreeBSD system before when adding a lot of files to ipfs on UFS. For reference it was an old 32bit testing machine running I think fbsd 9 and ipfs 3.x at the time. It couldn't handle all the directories under blocks/.

@Kubuxu Kubuxu changed the title Block flatdb has too many buckets. datastore: block flatfs has too many buckets. Aug 9, 2016
@Kubuxu
Copy link
Member

Kubuxu commented Aug 9, 2016

Which version are you running?

There is difference in total prefix length and amount of entropy in that prefix, due to multihash prefix. In go-ipfs 0.4.3-X there are 512 buckets in the flatfs. Which is only bit more than git uses. This is also different from what was used in 0.4.2 and before.

@mateon1
Copy link
Contributor Author

mateon1 commented Aug 9, 2016

@Kubuxu I'm using go-ipfs version 0.4.2 (version --all fails on this one), the prefix length in my database is 8 hex characters, with the first 4 of them being constant.
I'm not sure how to install and test the 0.4.3 version on Windows, but I'll look in the docs and ask on IRC.

@whyrusleeping whyrusleeping added the topic/repo Topic repo label Aug 9, 2016
@jbenet
Copy link
Member

jbenet commented Aug 11, 2016

Yeah I warned about this before. I think the verdict was "any filesystem
will do fine on this, you're worrying ove nothing".

We can implement a migration to make the fanout smaller and deeper. Though
if there's serious perf implications wonder if it should be a config
dependent thing. That May be error prone. Maybe just change fanout and gun
for arena storage.
On Mon, Aug 8, 2016 at 23:14 Dominic Della Valle [email protected]
wrote:

Adding an anecdote here, I've run into a "Too many links" issue on a
FreeBSD system before when adding a lot of files to ipfs on UFS. For
reference it was an old 32bit testing machine running I think fbsd 9 and
ipfs 3.x at the time. It couldn't handle all the directories under blocks/.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#3062 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAIcoaaiguKHIyQUCPl4gNTLG4VxZA74ks5qd_CPgaJpZM4Jfegt
.

@Kubuxu
Copy link
Member

Kubuxu commented Aug 11, 2016

@jbenet it might not be an issue after all, we already migrated from 16bit fanout to 5bit limiting number of buckets from 65k to 512.

@mateon1
Copy link
Contributor Author

mateon1 commented Aug 11, 2016

@jbenet Actually, in 0.4.2, this was a huge problem, with all the buckets causing massive slowdowns on the order of 8-10x slower than an empty repo. It's way less of a problem in 0.4.3, but it still doesn't scale to terabytes in the repo.

@Kubuxu
Copy link
Member

Kubuxu commented Aug 11, 2016

Yes for that we would need to implement variable fanout depending on number of blocks stored in previous level. It will be complex as you need to both take into account expanding and contracting the fanout.

@jbenet
Copy link
Member

jbenet commented Aug 12, 2016

Yes for that we would need to implement variable fanout depending on number of blocks stored in previous level. It will be complex as you need to both take into account expanding and contracting the fanout.

We could just expand for now. implement compaction when it is needed

@jbenet
Copy link
Member

jbenet commented Aug 12, 2016

but yes, we really need it to be variable sized. guess who implemented this years ago. too bad it's the wrong language :) and not auto-expanding.

@whyrusleeping whyrusleeping added the status/deferred Conscious decision to pause or backlog label Sep 14, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/deferred Conscious decision to pause or backlog topic/repo Topic repo
Projects
None yet
Development

No branches or pull requests

5 participants