Mainline DHT

Summary

Mainline DHT is the name given to the Kademlia-based distributed hash table (DHT) used by BitTorrent clients to find peers via the BitTorrent protocol. The idea of using a DHT for distributed tracking in BitTorrent was first implemented[1][2] in Azureus 2.3.0.0 (now known as Vuze) in May 2005, from which it gained significant popularity. Unrelated but around the same time, BitTorrent, Inc. released a similar DHT into their client called Mainline DHT, and thus popularized the use of distributed tracking in the BitTorrent protocol. Measurement showed that by 2013, the concurrent number of users of Mainline DHT is from 16 million to 28 million, with intra-day changes of at least 10 million.[3]

Description edit

Mainline DHT is based on the popular Kademlia DHT design.[4] Previous to usage of a DHT for distributing peers, trackers were the only method of finding peers. The key feature of using the DHT over trackers is that the decentralized approach favours the nature of the BitTorrent protocol. The DHT operates by distributing lists of peers identified by the SHA-1 hash of the torrent.

Operation edit

The SHA-1 hash of a torrent, the infohash, is synonymous with a Kademlia key, which is used for finding peers (values) in the overlay network. To find peers in a swarm, a node sends a get_peers query with the infohash as key (equivalent to a Kademlia FIND_VALUE) to the closest known nodes (with respect to the key distance). Like Kademlia, if the node does not return the value (peers), it persists further in an iterative operation. However, after the search is exhausted, the client then also inserts the peer contact information for itself onto the responding nodes with IDs closest to the infohash of the torrent.

Token edit

Nodes use an additional measure known as a token to ensure others don't sign up other hosts for torrents. The return value for a query for peers includes this opaque value. For a node to announce that its controlling peer is downloading a torrent, it must present the token received from the same queried node in a recent query for peers. When a node attempts to "announce" a torrent, the queried node checks the token against the querying node's IP address.

Mainline DHT uses the SHA-1 hash of the IP address concatenated onto a secret that changes every five minutes for a token value. Tokens up to ten minutes old are accepted.

KRPC edit

A node in the Mainline DHT consists of an IP and port combination. Nodes communicate via an RPC protocol called KRPC. KRPC is a simple protocol that consists of nodes sending messages (queries, replies and errors) containing bencoded dictionaries over UDP.

A KRPC message is a single dictionary with two keys common to every message and additional keys depending on the type of message. Every message has a key "t" with a string value representing a transaction ID. This transaction ID is generated by the querying node and is echoed in the response, so responses may be correlated with multiple queries to the same node. The transaction ID should be encoded as a short string of binary numbers, typically two octets are enough as they cover 2^16 outstanding queries. The other key contained in every KRPC message is "y" with a single character value describing the type of message. The value of the "y" key is one of "q" for query, "r" for response, or "e" for error.

Queries edit

Queries, or KRPC message dictionaries with a "y" value of "q", contain two additional keys; "q" and "a". Key "q" has a string value containing the method name of the query. Key "a" has a dictionary value containing named arguments to the query.

Responses edit

Responses, or KRPC message dictionaries with a "y" value of "r", contain one additional key "r". The value of "r" is a dictionary containing named return values. Response messages are sent upon successful completion of a query.

Errors edit

Errors, or KRPC message dictionaries with a "y" value of "e", contain one additional key "e". The value of "e" is a list. The first element is an integer representing the error code. The second element is a string containing the error message. Errors are sent when a query cannot be fulfilled.

Routing table edit

Buckets are structured differently from those in Kademlia. Instead of a list of 160 buckets, BitTorrent starts with only one bucket. When a bucket becomes full, one of two things can happen:

  • The bucket is split.
  • Old nodes are pinged (as in Kademlia).

Splitting is an operation that occurs only if our own node ID falls within the range of the bucket. The bucket being split is replaced by two new buckets each with half the range of the old bucket and the nodes from the old bucket are distributed among the two new ones.

There are two benefits to this bucket implementation:

  • Less memory is used for a routing table of less than 160 buckets.
  • When searching buckets, it isn't necessary to retrieve additional nodes from adjacent buckets because it is guaranteed that there are enough in the current bucket.

Implementations edit

Mainline DHT was first included in version 4.2.0 of the BitTorrent software (November 2005). Since then, it has been implemented by a number of other clients:

References edit

  1. ^ Jones, Ben (7 June 2015). "BitTorrent's DHT Turns 10 Years Old". TorrentFreak. Retrieved 2015-07-05.
  2. ^ "Vuze Changelog". Azureus.sourceforge.net. Archived from the original on 2006-12-01. Retrieved 2012-07-15.
  3. ^ Wang, Liang; Kangasharju, Jussi (2013). "Measuring Large-Scale Distributed Systems: Case of BitTorrent Mainline DHT" (PDF). IEEE Peer-to-Peer. Retrieved 26 October 2013.
  4. ^ Loewenstern, Andrew; Norberg, Arvid (2013-03-22). "DHT Protocol". BitTorrent.org. Retrieved 2021-11-26.
  5. ^ "About – Deluge".

External links edit

  • Official specification and extensions for IPv6 support
  • BitTorrent Tech Talks: DHT
  • Mainline DHT Measurement