S-Kademlia and multipaths

This post assumes you know what Kademlia is, and roughly how its distance metric and routing algorithms work.

Actually, there are many possible routing algorithms based on any given distance metric, and while we know how to analyse various properties of a routing algorithm, the field has not yet devised a single objective way to compare two routing algorithms, nor even what it means to be “the best” routing algorithm. Most of the reasonable ones all follow a similar pattern though, roughly:

  1. each node maintains a routing table, which contains ~k “close” entries, ~k “further” entries, ~k “even further” entries etc, for super-linearly expanding notions of “further away”
  2. routing is done by choosing the neighbour in your routing table that is closest to the target, and then iterating to their neighbour, etc etc.

The devil is in the details of course, and one major thing is that in practise you want to do this in parallel - but how parallel.

The original paper defines a parallel routing algorithm roughly as follows:

  1. query the closest d neighbours
  2. out of the combined set of their results, select the closest next d
  3. loop until you’ve reached your target, or run out of patience

This is an insecure parallel routing algorithm - if even only 1 of the neighbours is malicious, this can trivially be attacked by giving d results that are fake nodes “very close” to the victim’s target, but are in fact all controlled by the attacker. From this point onwards the attacker can lead the victim on a wild goose chase, repeatedly giving him bad results that are again controlled by the attacker. This is an example of the general class of attacks called the “eclipse attack”.

S-Kademlia tries to protect against this, to quote: “The initiator starts a lookup by taking the k closest nodes to the destination key from his local routing table and distributes them into d independent lookup buckets. From there on the node continues with d parallel lookups similar to the traditional Kademlia lookup. The lookups are independent, except the important fact, that each node is only used once during the whole lookup process to ensure that the resulting paths are really disjoint.”

The meaning is hard to decode; worded slightly differently this means:

  1. keep track of d separate lookups for a given target lookup. each lookup is expect to hop through a bunch of different peers.
  2. when the results of the i-th lookup are returned, select the next hop for this i-th lookup out of these results, but ignore any nodes you’ve already previously queried in any of the d lookups

(To make this work and to improve redundancy, S-Kademlia also has the closest d neighbours to a given target store the value for that target, instead of just the single closest node storing it. This is not very relevant for the discussion below so we’ll ignore it from here on and just focus on the actual routing part, before we reach the destination.)

The view then, is that we have d lookups, where each lookup hops through a different sequence of peers to reach one of the target storage nodes. The S-Kademlia paper calls these paths “disjoint paths” and asserts that being disjoint is a security property, leading some other internet commenters to believe that losing disjointness would be to lose this security protection. However this view is not accurate - it purposefully discards a lot of information, omits handling some corner cases that degrade its performance in a real network, and gives an inaccurate picture of security. Below, I’ll first describe why this view is inaccurate, then describe a more complete viewpoint.

Suppose d=3, k=3, and we issue a query for target 100 to our neighbours {1, 2, 3}. They reply, in order:

  1. node 1 replies {4,5,6}
  2. node 2 replies {4,5,6}
  3. node 3 replies {1,5,6} – note that this is perfectly legitimate in the general case since node 3 can’t be expected to know that we also have node 1 as a neighbour

According to S-Kademlia, one possible choice of paths would be {1, 6} and {2, 5}. Once you have made these choices, there is nothing further to do when you get a reply back from node 3 because all its results have already been used up.

However, this effectively reduces your query from this point onwards to be 2 lookups instead of 3, which is what we specifically wanted to avoid! By the problem assumptions, we have the resources to execute 3 lookups in parallel, but we end up under-utilising this due to too-restrictive rules on S-Kademlia’s part.

Let’s take a step back however. Why did we have to choose {1, 6} and {2, 5}? If we had chosen something else, namely {1, 4} and {2, 5}, we could have then chosen {3, 6} and this would still be within S-Kademlia’s own rules.

The problem arises from arbitrarily selecting one of a set of results, and then discarding the rest too early in a way that significantly affects the remainder of the algorithm. In other words, S-Kademlia’s rules as stated are not symmetric - they give significantly different outcomes (2 lookups vs 3 lookups) depending on the internal random choices we ourselves make, regardless of what the outside world tells us.

As a general principle in designing algorithms, the results of arbitrary decisions should be indistinguishable from each other (in the long run), otherwise the design of the algorithm is questionable. In the flawed S-Kademlia algorithm, the decisions are easily distinguishable (d vs d-1 parallel lookups), but in the version below I give this invariant is preserved.

Objectively speaking, there is no such thing as “disjoint paths” - they are an artifical construct made up for convenience to help explain the S-Kademlia algorithm, but is not the whole picture of what actually happens in reality. From the example above, since we chose 6 our query will use that within one of its paths, but we “arrived at” 6 via all three nodes {1, 2, 3}, there is no reason to pretend that we only used {1, 6, …} as a “disjoint path”. The actual paths we used are already overlapping. If we want to, we could select some paths out of the ones we used, and these would be disjoint when ignoring the paths we did not select. However this is a simple logical consequence of another property we describe below, that is more directly the contributor to security, and that we consider a more natural way of thinking about the problem that does not involve selecting some paths and ignoring others.

First we start with a fuller picture. Between adjacent hops, what you actually have is a bipartite graph of hop-n query-peers to hop-n result-peers. More precisely: suppose for hop n, you have d nodes to query, and each of them gives you back up to k results. Then, you could have anywhere between 0 and d * k results, since different nodes could give you the same result. Call the query nodes Q[n] with Q[n].size = d and the result nodes R[n] with R[n].size <= d * k, then Q[n] and R[n] forms a bipartite graph with outdegree(v) = k for every v in Q[n]. To choose the query nodes for the next hop Q[n+1], we want to select up to d nodes out of R[n]. Instead of disjointness, we express our desired security property as follows:

  1. As long as R[n].size >= d, we must choose Q[n+1] such that its size = d.

    Note: if R[n].size < d then we are forced to select Q[n+1] = R[n] with size < d, since we simply don’t have enough information to continue. This could happen with a very bad or very small underlying network for example.

  2. We must also choose Q[n+1] such that the union of the sets { predecessor(v)
    | v in Q[n+1] } is equal to Q[n].

    In other words, all of the hop-n query peers “have vouched for” at least one of the hop-(n+1) query-peers, or equivalently the hop-(n+1) peers are not “vouched for by only a subset of the hop-n peers”. This latter constraint is a more complete description of the reason on why security is improved. By the pidgeonhole principle, if you do this it is always possible to select d disjoint pairs (q, r) out of your hop-n and hop-(n+1) results, such that q is one of the nodes that r came from. So it still preserves the “disjoint” property that S-Kademlia talks about. But note this is an arbitrary choice; the whole picture is the bipartite graph of our queries vs our results.

If (1) or (2) is broken, then this means our results after hop n has been given to us by fewer than d nodes, and {any other nodes that we might have previously talked to} do not have any representation in these results. Note that even a drop from d to (d-1) is bad as this affects us for all remaining hops, where the number could drop even further. We want to keep this number at d as far as we can. This is “morally speaking” what we and S-Kademlia want to avoid - for an incomplete fraction of any query set, to control the overall query.

So using the above simple example, you could do something like this:

  1. node 1 replies {4,5,6}
    a. remember that: node 4 came from 1, node 5 came from 1, node 6 came from 1
    b. arbitrarily choose any result that comes from 1, e.g. 6
  2. node 2 replies {4,5,6}
    a. remember that: node 4 came from [1,2], node 5 came from [1,2], node 6 came from [1,2]
    b. at least one previously-chosen result {6} also came from 2, so
    c. arbitrarily choose any other result that was not already chosen, e.g. 5
  3. node 3 replies {1,5,6}
    a. filter this down to {5,6} as 1 was already queried in previous hops
    b. remember that: node 4 came from [1,2], node 5 came from [1,2,3], node 6 came from [1,2,3]
    c. at least one previously-chosen result {5, 6} also came from 3, so
    d. arbitrarily choose any other result that was not already chosen, e.g. 4

Note: In (2c) (resp. (3c)) if all previously-chosen results did not come from node 2 (resp. node 3), then in the next step 2d (resp. (3d)) we would instead arbitrarily choose one of the results directly from the reply. In fact this logic is done in step (1b) but we note it explicitly here for clarity, because in (1b) there are no previously-chosen results so the condition is vacuously true.

Note: For a more complete version of this algorithm that gracefully deals with the case when R[n].size < d, see here.

To compare this with S-Kademlia, instead of choosing one result (and sending a next-hop query to it) and discarding the rest, we choose one but keep the rest for later reference. The intuition is that we might “backtrack” our choice, but since we will have already queried that node it would be inefficient to “backtrack” the query itself (by cancelling it or ignoring its result), the more efficient thing would be to locally “move” this query into a different disjoint path. However we suggest that even this way of looking at the picture is unnecessarily complex and convoluted, one should not implement the algorithm by literally reassigning queries to different lookup paths, and the more natural viewpoint and implementation is as we described previously.

Extended across a whole sequence of 0…n hops, I find it suitable to name the whole collection of bipartite graphs as “fat multipaths” - in this way you are acknowledging all of the individual paths that exist, that form the whole parallel lookup query.

Note that this principle of a fat multipath is applicable not just to Kademlia DHT routing, but in fact to all parallel routing, where you want to retain query diversity, and don’t want to let the end result be fully-controlled by an incomplete fraction of any part of the query set.

Further research to adapt this to other routing schemes would be good follow-up work. The smallest non-trivial instance of this general problem would be:

Suppose you have two query nodes, P, Q, and you make the same query to both of them:

  1. P gives you {X, A}
  2. then Q gives you {X, B}

In step (1) you have no information, so you need to arbitrarily choose one of them. Suppose you chose X, then in step (2) you are forced to choose B. So there is nothing to “optimise” here. However suppose in step (1) you chose A, then what is better in step (2) - to choose X or B? The difference is that P also gave you X previously, but B is a completely-new result. Is it better to choose the former, or the latter?

To compare, in our skad algorithm this would be the difference between choosing prioritise_unique_results is True or False.

1 Like

Thanks for the detailed write-up @infinity0!

Following up on the scenario above where there are no unused results within node-3’s result set, thus reducing d from 3 to 2. The S-Kademlia paper says:

We also see that with k = 16 there is enough redundancy in the k-buckets to actually create disjoint paths.

Thereby one could rely on the fact that this is a reasonably rare case to hit, and when hit, prioritize unique results, but fall back to any, in order to not lower d for all future rounds, correct? If I am not mistaken this is also the scenario in your first test case within your sample algorithm.

In a situation with no attackers, this would be reasonably rare for most of the path, but may well be reasonably common the closer we get to the target, since peers closer to the target have a higher chance that this part of their k-buckets overlap.

In a situation with attackers, if you accidentally select a malicious hop-2 node via an honest hop-1 node, S-Kademlia will fail to recover whereas the tweak I suggested will recover by selecting another hop-2 node out of the hop-1 results.

“Prioritising unique results” is a separate issue, it’s just a parameter that tweaks the way in which you avoid reducing d. The fix I suggested will keep d at its intended value even if you set this to false. Right now I’m unsure which value {true,false} is better. I couldn’t think of any obvious way to break the symmetry, so decided to leave this question open for the future.

I guess I have a different understanding or interpretation of what “disjoint” is supposed to mean for the S/Kademlia lookup, because having path 3 in this example choose 4 as the next node to query in 3d violates disjointness from my perspective. I essentially use the following definition: Two query paths are disjoint if the sets of nodes queried on each path are disjoint and every node queried on a path was discovered earlier on that same path. To me, this is what seems to ensure that, as long as one path is free of adversarial nodes, the lookup can succeed. By having one path query a node discovered only on another path, as suggested above, from my understanding that path is now potentially “poisoned” and thus though you retain 3 paths in the query which are disjoint just in terms of the sets of peers queried, two of them are no longer disjoint according to my understanding of disjointness, because one path was allowed to influence another, and thus from a security perspective there are again only 2 “disjoint paths” left.

The point is not to get hung up on the concept of “disjointness”. You can think about it this way: path 3 does not actually choose 4, it only appears that way because path 1 chose 6 first. However if path 3 got its result earlier, it would have chosen 6, then path 1 would have chosen 4, and “disjointness” is preserved. We simply tweak the algorithm so this possibility happens, regardless of which result is received first. The paths are an artificial arbitrary construct, what is real is the graph of nodes and their query results which point to other nodes.

The security property can be described in more general terms as “at all times, all d outstanding queries must have come from a total of d previous nodes”, which is what my version of the algorithm preserves. (Then there are the fixes for the other corner cases I mentioned in that PR.)

If you have this property, then you will be able to select d disjoint paths from the final graph, regardless of the order in which the query nodes were returned to you. OTOH implementing the algorithm in terms of actual paths, makes it sensitive to this ordering and occasionally will result in fewer nodes being traversed.

I’ve continued to think about the problem, here is a more “theoretically nice” way of explaining the approach & implementing the algorithm:

Intuitively speaking, the security property can be described as a maximum-flow property via the initial nodes of our graph. (Note that calculating trust values based on flow algorithms is not a new idea, it’s been done before e.g. in the advogato metric.) We arrange the problem as a maximum-flow problem where any maximum flow must pass through all of our initial nodes.

We initialise the query with d peers. As we query peers, we are building up a graph of who knows about who. By ignoring results that are further away than the query node, we ensure that the graph is a DAG.

Now we construct a flow-graph out of our query-graph:

  1. A single flow-source node has an edge to all the initial nodes.
  2. All nodes (from our query graph) have an additional edge going into our flow-sink.
  3. All edges are of capacity MAX, except that edges from step (2) are of capacity (MAX - DHT distance) i.e. the closer the DHT distance, the larger the capacity.

MAX is chosen so that the largest node-sink capacity is smaller than the sum of the two smallest node-sink edge capacities.

This means the max-flow algorithm will

  1. favour nodes closer in the DHT
  2. favour going through 2 nodes rather than over 1 node, even if 1 node is closer.

(2) is our equivalent of the “disjoint paths” property.

Then, we simply run a max-flow algorithm over this graph to tell us what the closest k nodes are on “disjoint” paths.

During the running of the algorithm, whenever we receive a new set of results from a queried peer, we run the max-flow algorithm over the graph that includes all thus-far unqueried nodes, to tell us what the next d nodes are to query.

(d-1) of this should overlap with the previous d nodes queried, and a single new node would be returned to query. A sketch on why this is true is that: each single new query result only adds edges through one node (the queried node) to our graph, meaning that (due to our flow constraints) at most one of the new result nodes can possibly be used in any maximum flow. Nothing else about the rest of the graph changed, so the rest of the results would be overlapping.

(Since most of the edges are the same capacity (MAX) it is likely possible to optimise the algorithm, but I want to run some more tests on this one first.)

An example. A query graph that looks like this:

     3 -> 2 -> 1
     4 ---|
     5 ---|
7 -> 6 ---|
9 -> 8 ---|

gets converted into a flow graph that looks like this:

     |-------------> 3 -> 2 -> 1 --------------|
     |                    ^                    |
     |-------------> 4 ---|                (MAX - 1)
     |                    |                    |  [ other edges not shown for brevity ]
     |-------------> 5 ---|                    v
 [Source]                 |                 [Sink]
     |--------> 7 -> 6 ---|
     |                    |
     |--------> 9 -> 8 ---|

All edges have capacity MAX, except edges to the sink which have capacity (MAX - distance).

We choose MAX such that (MAX - 1) < (MAX - 9) + (MAX - 8) i.e. MAX + 16 < 2*MAX i.e. MAX > 16.

The nice thing about the above approach is that no special logic is needed to deal with queries that fail - simply mark the query as having returned no results, and call the max-flow algorithm again.

Unfortunately, in playing with code that implements this I realised that it’s not actually the max-flow problem, but the max-saturated-flow problem, where a flow is not allowed to partially go through an edge. Sadly, this problem is NP-hard and it doesn’t seem to have a common solution. I’ll keep looking however, because my solution based on this covers all of our corner cases so far in a very straightforward manner.

OK, I’ve figured out a workaround which is efficient enough - iterate through all d-sized combinations of “possible queries” and return the minimum one who has a max-flow equal to d. For example if we have nodes 1-10 as potential queries, d=3, then we check if nodes {1,2,3} has a max-flow equal to d via the initial nodes, if not we then check {1,2,4}, etc etc. Since all these numbers are small (~20), the cost is not so bad.

The code has been updated and runs fine with all the examples in the OP, except prioritise_unique_results has been dropped for now, it’s a bit fiddly to unify with the max-flow approach. There are a few additional highlights:

  q = SKadLookup([10,11,12], 0)
  assert(q.recv_result(10, {5,6}) == 5)
  assert(q.recv_result(11, {6,7}) == 6)
  assert(q.recv_result(12, {8}) == 8)
  assert(q.recv_result(5, {1}) == 1)
  assert(q.recv_result(1, None) == 7)

This demonstrates that the algorithm can backtrack multiple levels correctly:

  • initial nodes are 10, 11, 12, trying to reach key 0 (makes the distance calculations easier)
  • 10 gives back {5, 6}, we query 5
  • 11 gives back {6, 7}, we query 6
  • 12 gives back {8}, we query 8
  • 5 gives back {1}, we query 1
  • but this query fails! should we try 6 instead? no, because 6 is already being tried. so we query 7, which is the closest untried node so far that retains the “disjointness” property.
  • at the end of this example, we have outstanding queries 6, 7, 8, which can be thought of as coming from the initial nodes 10, 11, 12 respectively
  • if we had implemented this as “disjoint paths”, there is no possible way that path 1 (10 -> 5 -> 1) would be able to backtrack to query node 7 later - it would see that node 6 is already being queried and be prevented from querying this node, but nothing else in path 1 refers to node 7

Tweaking the example to backtrack twice:

  q = SKadLookup([10,11,12], 0)
  assert(q.recv_result(10, {5,6}) == 5)
  assert(q.recv_result(11, {6,7}) == 6)
  assert(q.recv_result(12, {8}) == 8)
  assert(q.recv_result(5, {1,2}) == 1)
  assert(q.recv_result(1, None) == 2)
  assert(q.recv_result(2, None) == 7)

Yup, works nicely too!

@mxinden @romanb As requested I’ve written a newer post with the max-flow-min-cost stuff explained in more detail.