XMiDT

Call For Consistent Hash Replacement

We’ve found that the consistent hash implementation in use is quite expensive in terms of compute when you use around ~200 virtual nodes. The recalculation takes exponentially longer as you increase either the number of virtual nodes or the number of actual nodes. This limits our ability to scale to beyond 200 Talarias in a DC because it slows the system response time down to an unacceptable level.

John Bass & I have talked a bit out revamping this area of code for a while, but I wanted to post a call for help on this front because:

  1. It will likely impact everyone’s cluster.
  2. It will likely cause a “breaking change” to go into the system … so we need to somehow show this impact in compatible server versions (or a better idea?).
  3. Is something I’d like to discuss a community more.

@schmidtw, what algorithm are you using right now? What algorithms did you tried already?

I’ve found a post with interesting algorithms (most of them from Google) for consistent hashing in this post: https://medium.com/@dgryski/consistent-hashing-algorithmic-tradeoffs-ef6b8e2fcae8

Some algorithms have limitations or tradeoffs, but I think it’s a good starting point.

Best,
Carlos

Carlos, the article you linked is very good. I had not found that one previously.

Based on the article, I think the traits that seem like a good match:

  • Rebuild time - Fast (this allows failures to be accounted for quickly)
  • Calculation time - Fast (this allows routing to be quick)
  • Memory - don’t care that much (we generally have more memory then compute, so we can give on this front)
  • Load distribution - 5% seems good enough
  • Needs to be tuneable to 1,000’s of nodes for very large deployments

The implementation we’re presently using can be found here:

Ideally I’d like to use a few of these & write up some code that matches the interface we need & run some real benchmarks before we choose the next one.

I think there is some merit in looking into adding a gokit interface for ring hashs since it is likely other microservices will benefit.

Best,
Wes

It’s important to remember the multiplicity of each of the use cases when considering which to optimize for. I would sacrifice 10x rebuild time for 2x calculation time.