Why Apache Benchmark Is Not Enough

By Manuel Pichler, first published at Tue, 05 Sep 2017 10:26:19 +0200

Download our free e-book "Crafting Quality Software" with a selection of the finest blog posts as PDF or EPub.

You can also buy a printed version of the book on Amazon or on epubli.

You are working for months on a new web application or e-commerce system and usually a few weeks or just days before the launch a complete enough feature set is running on a production-like system so that you can run a realistic load-test. Hopefully providing you with accurate picture of the performance of your future system.

To safe time and effort, you probably opt for the simple solution and use the wide-spread Apache Benchmark (ab) or siege commandline tools to setup a load-test. Both allow you to generate load on a given URL and collect performance metrics. Siege even allows you to provide a list of URLs and login if necessary with a little more effort. The results from both tools are simple numbers that are easy to communicate: users / second and average, min, max response time.

We often advise to resist the temptation of simple tools and numbers, because they are suited for benchmarking and you cannot trust the results of ab or siege to be realistic real-world load-simulation of your system.

There are a number of reasons for this:

Both ab and siege only allow you to test a single, hardcoded path through your application. You can increase the number of simultaneous users that simulate this use-case, but they are all doing it 100% exactly the same. If you are load-testing a shop system this means visiting the exact same category page, product page and searching for the same search terms.
The problem with this approach is that you will probably have a much higher cache ratio in all parts of your stack (Reverse Proxy, MySQL, Memcache, Opcode Cache, Kernel) then under real-world traffic.
You need more randomness in bot users following different paths, starting at and visiting different pages to simulate the real amount of cache misses.
The simulated click-path of your users provides just one usage scenario. The problem with this is, that you don't have more fine grained control on the realistic usage share of your features. In a shop you will have a much higher share of users just viewing products then users actually checking out and buying a product.
In more complex load-testing setups, you would define several different scenarios such as anonymous user, search-engine bot, logged in user, random traffic user, buying user and so on. Then you would configure your test to run different shares of each scenario to provide a more realistic model of your real world traffic.
ab and siege cannot be used to define complex use-cases with form submission or multi-step processes (for example a checkout). Code triggered by complex use-cases usually has a higher resource usage that can affect the performance of the other endpoints. Example: If your homepage can handle 100 users / second without traffic on any other pages, then maybe its only 20 users / second as soon as other, heavier pages are requested at the same time.
The requests / second and average response time metrics are simple to understand and communicate, but are misleading in the end.
With requests / second metric, all you can actually say is "When our site is used with this unrealistic traffic pattern, then we can handle so many users per second." Not very re-assuring, given the three previous arguments about how far of we are from real-world usage.
Second, you should never use the average to analyse response times. The average is calculated assuming a statistical normal distribution. But response time data is almost always either log-normal distributed or the distribution has peaks due to caching. This means different percentiles at 50%, 75%, 90%, 95% and 99% provide you with much better insights.

Are there reasons to use ab or siege? Yes. When you develop locally on a specific page and try to optimize it using a benchmark. You can quickly get a relative comparison of the performance before and after a change under similar traffic conditions.

Get a performance optimization workshop together with Qafoo. We can help you analyze your numbers and get started with proper jMeter tests to simulate real load for your web application.

But if you want a realistic estimate of the traffic your production system can handle, then you should use specialized tools such as Apache jMeter.

In addition to a UI, where you can click together complex use-cases and scenarios using different load-generating strategies, jMeter is also fully programmable to every possible use-case. It does take time to learn jMeter, but as a result you are much more flexible to run different detailed scenarios and get detailed data from every single request that you can analyze.

Download our free e-book "Crafting Quality Software" with a selection of the finest blog posts as PDF or EPub.

You can also buy a printed version of the book on Amazon or on epubli.

Get Technical Insights With Our Newsletter

Stay up to date with regular new technological insights by subscribing to our newsletter. We will send you articles to improve your developments skills.

Comments

windows 10 computers on Wed, 18 Apr 2018 10:07:36 +0200

Such the wonderfully blog i am to visit here this amazing site to require the amazing feature in file explorer i am look here to set the icon on the desktop screen thank for the share this thought.
Link to comment
Essay for you on Fri, 11 May 2018 09:01:08 +0200

It's too late already - the deed has been done. Although I did sneak what I imagine to be any scientist's worst nightmare: a completely unlabelled graph.
Link to comment
Diy on Wed, 11 Jul 2018 14:35:21 +0200

Very useful information, would definitely try out the load test.
Link to comment
My ip on Wed, 01 Aug 2018 20:06:43 +0200

A distribuited benchmark tool which simulates real users would be a nice real scenario simulator.
Link to comment
192.168.1.1 on Tue, 07 Aug 2018 08:09:17 +0200

I thank you for the information! I was looking for and could not find. You helped me!
Link to comment
Do My Assignment on Sat, 25 Aug 2018 12:59:09 +0200

One way this can affect your tests is the point at which your application needs time to introduce because of a substantial import registry, finish a setup procedure, or load information from a chilly reserve.
Link to comment

Qafoo GmbH - passion for software quality