Skip to content

Add IPFS Kademlia DHT Specification #497

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 22 commits into
base: main
Choose a base branch
from
Open

Add IPFS Kademlia DHT Specification #497

wants to merge 22 commits into from

Conversation

guillaumemichel
Copy link
Contributor

Long overdue

Fixes #345

Copy link

github-actions bot commented Mar 18, 2025

🚀 Build Preview on IPFS ready

@guillaumemichel guillaumemichel force-pushed the dht branch 2 times, most recently from de8a611 to 9215300 Compare March 18, 2025 16:47
Comment on lines +262 to +265
DHT Servers SHOULD NOT return their own Peer ID in responses to `FIND_NODE`
queries. However, they MUST include information about the requester, if and
only if the requester is a DHT Server in its routing table and it is among the
`k` closest nodes to the target key.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In most cases, returning information about self (DHT Server) or requester (DHT Client) will be useless for the request, since the DHT Client already knows about both. The main argument of not including these addresses is to save bytes on the wire.

I don't see a use case where it would be useful that the DHT Server sends information about itself, since listen addresses and supported protocols should be exchanged using libp2p identify.

In some cases, it could be useful for the client (if it is a DHT Server) to know whether or not it is included in the Server's routing table, or which addresses are advertised. Information about the requester will essentially be present when a node is looking up itself when refreshing its routing table, and having this information provides a guarantee that the peer is actually routable. In a network that is large enough, it is very unlikely that the requester (being a DHT Server) will be among the k closest nodes to a key (other than self) being looked up.

Alternatively is possible to get this information by starting a fresh DHT Client and requesting about the initial peer, but it is a bit cumbersome.

also see libp2p/specs#535

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I think that including the DHT server is mostly useless and takes space on the wire. It isn't useful when looking for one specific key, or the X closest peers to a key.

The only use case I can see if for when DHT servers try to see whether they are included in another DHT server's routing table, and what information is stored about them. This would be useful only for crawlers to gain statistics on the network. The exact same information can be retrieved using another peer id.

Hence I would suggest that the closestPeers field shouldn't contain neither self (DHT server), nor the requester's information. It self AND/OR requester are among the k closest peers to the requested key in the DHT server's routing table, then they should be replaced by the next closest peer, so that the response always contains k peers (if network is large enough).

It means that the current implementations may not be compliant with this spec change, but it should be easy to address.

WDYT @achingbrain ?

Related issues:

@guillaumemichel guillaumemichel marked this pull request as ready for review March 19, 2025 10:29
lidel added 3 commits March 24, 2025 23:17
will do for now, we should support this in generator at some point
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @guillaumemichel for documenting this ❤️

Made a first pass and pushed small cosmetics (fetch my changes) + some questions / suggestions in comments inline.

I think its ready for wider feedback, and if no concerns landing it in a few weeks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: document uses "recommended" multiple times, are those all in the spirit of RECOMMENDED from RFC 2119 (means SHOULD)?

Or should (not pun intended) we replace those with specific SHOULD/MAY per case basis?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the "recommended" are for values/strategies that can be defined freely by implementers, they are not part of the protocol. In this case, "recommended" is more of a suggestion for values/strategies already used in implementations, e.g a safe default. Implementers are free to choose different values/strategies if they have other needs.

Hence the nuance should (not pun intended) be less than SHOULD but more than MAY. I think SHOULD is closer though.

I can either:

  • Replace the occurrences with SHOULD
  • Rephrase to suggested or safe default.

@lidel lidel changed the title kad: adding DHT spec Add IPFS Kademlia DHT Specification Mar 24, 2025
Copy link
Contributor

@aschmahmann aschmahmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for starting here, it's much needed 🙏. Most of my comments fall into:

  1. What should be here vs libp2p kad spec
  2. If we've covered all the things we need to here

and can be joined by using the [public good Amino DHT Bootstrappers](https://docs.ipfs.tech/concepts/public-utilities/#amino-dht-bootstrappers).
:::

### Client and Server Mode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this make sense in this spec rather than the libp2p one? Is the advice around client / server resource usage not generally applicable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the similar tradeoff to "RPC messages" – duplicating important content here is intentional, aims to improve readability, creates coherent document, and protects us from feature creep.

The need for jumping from spec HTML into markdown in libp2p/specs repo should be minimized.

Long term, protection from unintended feature creep is important: someone changes libp2p specs and adds new RPCs, we dont want that to magically apply to Amino DHT.

As prior art example, I copied part of PeerID specs and placed them in https://specs.ipfs.tech/ipns/ipns-record/#ipns-keys to ensure that if libp2p adds new key type, or extends protobuf, that key type does not automatically become part of IPNS specs, but needs an explicit IPIP to add it.

protocol identifier nor offer the libp2p Kademlia protocol
identifier for incoming streams.

## Kademlia Keyspace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is general, right?

Comment on lines +169 to +172
The IPFS Kademlia DHT uses a bucket size of `k = 20`. This corresponds to the
`k` value as defined in the original Kademlia paper [`[0]`](#bibliography). The `k` value is also
used as a replication factor and defines how many peers are returned to a
lookup request.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph seems specific to IPFS, is the rest of this section?

### Relation to the Amino DHT

Nodes participating in the public [Amino DHT Swarm](#amino-dht) MUST implement the
IPFS Kademlia DHT specification. The IPFS Kademlia DHT specification MAY be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just own that this document is the Amino DHT specification and that similar variants have been known to be deployed elsewhere (e.g. LANs, CJDNS, etc.) with different parameters (particularly where IP filtering, etc. comes into play).

Copy link
Contributor Author

@guillaumemichel guillaumemichel Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, the Amino DHT is just a swarm of peers. It is an instantiation of this spec, and not a spec by itself. There could be other swarms of peers implementing the {IPFS Kademlia DHT, Amino DHT} spec, but they would NOT be the Amino DHT (the only difference would be the libp2p protocol identifier). Other networks are using go-libp2p-kad-dht, which is an implementation of the IPFS Kademlia DHT, but their swarms are not the Amino DHT. Renaming the spec to Amino DHT could be confusing.

Also, some details about the LAN DHTs are included as part of this spec, since LAN DHTs are (or should be) part of the IPFS Specs. There is a large overlap in the behavior of the Amino DHT with the LAN DHTs (only difference is probably addresses filtering).

If it makes things clearer, I can rename this doc "Amino DHT Spec", and specify that using this spec and changing the libp2p protocol identifier is possible and would create a new DHT swarm. We could then add a dedicated LAN DHT spec, describing the diff with the Amino DHT Spec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a benefit of keeping it as "IPFS Kademlia DHT" ans then showing it can multiple independent instances/swarms.

We really want the spec to show there is a surface for rolling their own DHT swarm by deploying own protocol identifier, to avoid accidental deployments of useless nodes to Amino (real risk, we had at least one incident in recent years).

I think we should clarify "libp2p Protocol Identifier" section by moving "Relation to" sections under it, and:

  • rephrase "Relation to Amino DHT" section to note "Amino DHT" is an instance of the "IPFS Kademlia DHT" spec mounted under /ipfs/kad/1.0.0 libp2p protocol
  • add "Relation to the Amino LAN DHT" section and note that LAN-specific version of Amino is
    mounted under /ipfs/lan/kad/1.0.0 libp2p protocol
  • add section "Creating Custom DHT Swarm" and suggest convention /<swarm-prefix>/kad/<version> there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified in cd85a7f.

Note that I left the "Relation to" sections since the goal is to inform the reader early about the nature of this document vs notion they may already have of the libp2p/Amino DHT. Happy to change if confusing.

to the routing table if any of its addresses already exceed the allowed
representation within the table.

### Routing Table Refresh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section seems general with the exception of any numbers being recommended.

the local node's Peer ID, ensuring the DHT Server maintains up-to-date
information on its closest peers.

## Lookup Process
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems shared/general


### Relation to libp2p kad-dht

The IPFS Kademlia DHT specification is a specialization of the [libp2p Kademlia DHT](https://github.com/libp2p/specs/tree/master/kad-dht).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like an awful lot of this document isn't describing anything specialized or specific to IPFS or the Amino DHT and is general to the libp2p kad spec.

What's the approach we're trying to take here:

  1. Only add IPFS / Amino -specific info
  2. Copy-paste an independent document so IPFS / Amino users don't need to look at the libp2p kad spec
  3. Make a mostly independent IPFS / Amino doc so users only have to look at the libp2p one for levels of detail exceeding some threshold
  4. Something else

This paragraph makes it sound like 1 is the approach we're taking but the doc looks like 2 or 3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I find the libp2p kad-dht spec quite confusing. In a way it just provides the minimal details for developers to build their own swarm specs and implementations on top of libp2p, but it also provides application specific details (IPFS focused).

I tried to make this document a stand-alone spec, so that we don't depend on the (IMO confusing) libp2p spec. It would be 2., mirroring the fundamental parts and being willingly IPFS specific where required.

I don't know what is the future of the libp2p kad-dht spec, but once we have an IPFS DHT spec, if deemed impactful we could iterate to remove content that is IPFS specific in the libp2p DHT, and after the libp2p DHT spec has become a framework for instantiating other DHT specs, we could remove redundant part from the IPFS DHT spec.

I am not sure to what extent it makes sense to have a libp2p DHT spec, e.g defining specific RPCs, wire formats, etc. because libp2p based DHT networks don't need to be compatible with each other. The only common item is peer routing (GetClosestPeers), but even for that other networks may want to choose their own wire format. Devs of other libp2p networks needing a DHT could either use the IPFS DHT spec (what they have been doing so far by using IPFS DHT implementations), or take inspiration from the IPFS DHT spec and impls to make their own custom implementation. All of the current DHT implementations are somehow IPFS-centric, there is currently no good generic libp2p DHT implementation. This discussion probably needs its own issue.

I will rephrase to mention that it is an instantiation of the libp2p Kademlia DHT (rather that a specialization). I don't feel strongly about the phrasing here, but just wanted to clarify that the IPFS DHT is an example of libp2p DHT, but the IPFS spec is actually complete.

Copy link
Member

@lidel lidel Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with doing 2 for the document to flow better and over time aiming at 3, for reasons listed in #497 (comment)

Perhaps go further and change this paragraph to state "The IPFS Kademlia DHT specification extends libp2p Kademlia DHT with features related to CID, IPNS, and content providing."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


### Relation to the Amino DHT

Nodes participating in the public [Amino DHT Swarm](#amino-dht) MUST implement the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on how we should handle parts of the network that evolve over time?

For example:

  • Maybe we mandate TCP + Yamux + Noise for all DHT server nodes, or we mandate TCP + Yamux + Noise and QUIC for all DHT client nodes and say servers only need to support one. We could then encourage with SHOULDs other transports like WSS, WT, WebRTC, etc. and change from SHOULD to MUST as we see adoption rise past a certain point where we'd feel comfortable having clients rely on the new behavior from the servers.
  • Same if we decide to add new RPC methods / behaviors or change some server behavior without bumping the protocol version (e.g. IP / ASN limiting, etc.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points!

  • I will add transports requirements
  • For new RPC methods/behaviors, I suggest we deal with it in a subsequent PR/IPIP since it will require some discussion


* `FIND_NODE`: In the request `key` must be set to the binary `PeerId` of the
node to be found. In the response `closerPeers` is set to the DHT Server's `k`
closest `Peer`s.
Copy link
Member

@achingbrain achingbrain May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the DHT server include itself in the closerPeers response field?

Can it include the requester?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answered one of my questions:

Can it include the requester?

No. If this comment is accurate it's worth documenting - https://github.com/libp2p/go-libp2p-kad-dht/blob/010ef38a342f749a6fe380d2b67e651767850f3a/handlers.go#L265

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DHT Servers SHOULD NOT return their own Peer ID in responses to `FIND_NODE`
queries. However, they MUST include information about the requester, if and
only if the requester is a DHT Server in its routing table and it is among the
`k` closest nodes to the target key.


* `GET_PROVIDERS`: In the request `key` is set to the multihash contained in
the target CID. The target node returns the known `providerPeers` (if any) and
the `k` closest known `closerPeers`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as https://github.com/ipfs/specs/pull/497/files#r2092857416 about what peers can be in the closerPeers field

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DHT Servers SHOULD NOT return their own Peer ID in responses to `FIND_NODE`
queries. However, they MUST include information about the requester, if and
only if the requester is a DHT Server in its routing table and it is among the
`k` closest nodes to the target key.

If it helps, I can duplicate the information at the line you mentioned here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IPFS DHT Specification is missing
5 participants