We’ve found that the consistent hash implementation in use is quite expensive in terms of compute when you use around ~200 virtual nodes. The recalculation takes exponentially longer as you increase either the number of virtual nodes or the number of actual nodes. This limits our ability to scale to beyond 200 Talarias in a DC because it slows the system response time down to an unacceptable level.
John Bass & I have talked a bit out revamping this area of code for a while, but I wanted to post a call for help on this front because:
- It will likely impact everyone’s cluster.
- It will likely cause a “breaking change” to go into the system … so we need to somehow show this impact in compatible server versions (or a better idea?).
- Is something I’d like to discuss a community more.