SRLB: a better load balancer than Google's MagLev for Linux
SRLB is the effort to bring load-balancing to Segment Routing with IPv6 (SRv6).
Now, why SRv6 in particular and not just regular, IPv4 based Segment Routing? Well, that is because the 128 bits given by the IPv6 address space are so bit we can not only store more addresses but also include other data in it, like addresses to functions for instance.
In the case of SRv6, we call the address to executing machine the Locator, while the function executed is addressed in a Function field.
Once we have function addressed on the network, the advantage of Segment Routing becomes obvious: we can force data to go through certain segments, hence go through certain functions in chain. Much like in a computer program, network programming suddenly becomes a thing.
Service Function Chaining scenario is now supported by IPv6 Segment Routing. We can consider a Service Chain as an ordered set of Virtual Network Functions (VNFs) and each VNF is represented by its IPv6 address. We assume that VNFs are hosted in "NFV nodes".
The srext module is used in a Linux NFV node in order to support legacy VNFs (i.e. "SR-unaware" VNFs). It was written in march 2017 by the netgroup. However it doesn't support a load-balancing feature similar to Google MagLev.
Maglev is Google’s network load balancer. It is a large distributed software system that runs on commodity Linux servers. Unlike traditional hardware network load balancers, it does not require a specialized physical rack deployment, and its capacity can be easily adjusted by adding or removing servers.
The main issue is that a MagLev machine then matches the packets to their corresponding services and spreads them evenly to the service endpoints. In other words, the machine has to be aware of all service replicas and establish which is busy and which has an acceptable load. It is mostly inefficient since in practice the machine cannot send simultaneous request to all service nodes so as not to flood with requests the services as the number of machines increase.
SRLB leverages Segment Routing features to allow a busy node to forward the request to another node itself. Statistically this proves enough after one hop in most cases1. Add that to SR policies load-balanced on a weighted basis among the SID lists associated with the selected path of the SR Policy, and you get SFC to function better off commodity Linux servers.
Contrary to MagLev, 6LB requires agents on the NFV endpoints. The agent consists in an out-of-tree module that registers pre and post forwarding functions to craft proper SRv6 packets based on a load estimate of the queried application. If the application is busy, it forwards the packet to the next node chosen beforehand by the load balancer.
The module registers:
- a pre-routing function
- a shared memory segment
- a post-routing function
Mitzenmacher, M., "The power of two choices in randomized load balancing", IEEE Transactions on parallel and Distributed Systems, vol. 12, no 10, p. 1094-1104, 2001 ↩