Friday, November 19, 2010

HA_Proxy load balancing leads to better Connexions performance


We have two different performance milestones coming up at Connexions. One to improve performance of authoring by reducing the size of certain catalogs where lookup time is related to size, and one to improve viewing times by making our slowly changing content cacheable.

Getting ready for these performance milestones, we are updating our basic request architecture and we have made some nice performance improvements in the average times for viewing content and using the authoring system. And this is before starting on the real performance work.

We were using Squid to load balance between front end zeo servers and we switched to using HA_Proxy for load balancing. Take a look at these graphs of average load times. The Y-axis shows the time to service a particular request. We measure a few different requests and each shows up in a different color. The graph on the left shows performance for the last month and the one on the right shows performance for the year. You should notice a dramatic drop in service times a week and a half ago when HA_Proxy was added for load-balancing.

The graph below shows the same timings, but for actions that authors take. It shows a similar speedup (lower height lines) for authoring. In the graphs for authoring you may also notice a major performance improvement in February of this year. That improvement was thanks to increasing the size of an application object cache.

So why would changing the load balancer have such a big benefit? We were surprised at the magnitude, but not the direction of the improvement. Squid isn't specifically designed to do load balancing, and we were using the Internet Cache Protocol (ICP) to approximate load balancing. Squid would ask frontends to respond to an ICP request and then use the speed at which they responded "no" to determine which one to choose for the request. HA_Proxy is designed to do load balancing. It did take some configuration, but it is working much better than trying to contort ICP for load balancing. Our settings for HA_Proxy choose the front end with the least number of current connections and then choose among equals using round-robin.

As background about our characteristics, Connexions serves about 2 million unique visitors per month. We receive between 50 and 60 requests per second, peaking at around 100 requests per second. Our performance is still spiky. Occasionally a request takes a very long time to serve, and that can be very frustrating for viewers and authors. We will continue to report on performance in the blog as we improve the infrastructure.

4 comments:

  1. Oh my. I'm not surprised you saw a big difference but I'm afraid much of the reason is likely because your Squid was not configured correctly. You are correct that ICP is not designed for this sort of load balancing. ICP is designed for multiple cache proxies to communicate to each other, not to load-balance requests to a backend server. With ICP turned on, each request to the backend is preceded by an ICP request. Naturally, this slows down the entire transaction. A better way is to turn off ICP and instead use Squid's backend health checking.

    I suspect that, even with a properly configured Squid, HAProxy may still be quicker but the difference will probably not be nearly as dramatic.

    Cheers,
    Ricardo Newbery

    ReplyDelete
  2. I agree with Ricardo, ICP could have been causing more problems than it solves. That was certainly my experience of ICP and Zope when we tried it years ago.

    That said, HAProxy does do a much better job at load balancing than most. We've seen similar results to you when we switched from Apache mod_balance to HAProxy a few weeks ago. The main reason being that HAProxy's least-open-connections algorythm actually works, unlike mod_balance's bybusyness one - which doesn't. Bybusyness is fine until a backend is taken down at which point it goes haywire and starts directing all traffic to one backend.

    HAproxy also helps in that it will take an overloaded backend out the pool gracefully. This means if you get a very long-running request which maxes out one backend then other requests will be directed elsewhere. Also, the throttling haproxy does us very good for Zope, just sending a few request through when a backend comes up and slowly bringing it up to speed over a few minutes allowing the caches to warm up.

    -Matt

    ReplyDelete
  3. Yep -- we suspected the ICP contortion was causing problems, but we didn't realize how much. Good to hear about positive experience with HAProxy out there.

    ReplyDelete