Recently I spent some time setting up a rather large scale deployment of Drupal. It involved a pair of blade servers with MySQL master/slave replication and a load balancer in front of them. At first it seemed like no big deal really. A pair of dedicated servers to run Drupal would serve up just about anything pretty quickly. But then things got a little interesting.
See the Drupal setup was to be a content source for another web application. Now this other web application is installed on one of the most impressive arrays of UltraSparc powered servers I've ever seen. I mean we're talking four T2000 web servers, load balanced, with 16GB of memory each all supported by a pair of monstrous resource servers and an Oracle database. The setup cost roughly half a million dollars. And we were told "make sure that Drupal can keep up with these servers, we're expecting a max load of 10,000 concurrent authenticated users at once". Now I know you're thinking "10,000? That's it?" and that's what I said at first. But then I realized something. Each user that logs into the big bad cluster will load their content from Drupal. But each page they view has at least 7 elements which must be requested from Drupal. So that means a load of 70,000 requests at once. Now to put this into perspective, they asked us to configure a pair of blade servers, costing no more than $10,000 with hardware specs well below even the lowest end server within the $500,000 Solaris server cluster monster. And of course we said "sure, we can do that". Here's how we did it.
We first sat down and talked about how Drupal works and where its own bottlenecks reside. This was an easy question to answer; the database. Drupal is a database heavy application. Everything (and I mean everything) is in the database. If we were going to get up to 10,000 concurrent requests with Drupal let alone 70,000, we were going to have to speed up MySQL's ability to return information. The second thing that Drupal needs, obviously, is PHP. So this meant speeding up Apache's capacity to read and run PHP code. Lastly, and probably the most obvious, is the simple fact that Apache has to be able to send the HTTP responses generated by Drupal as quickly as it can. With these three aspects in mind, we set out to build our "David" to go up against "Goliath".
Our research first brought us to this website: http://2bits.com/articles/benchmarking-drupal-with-php-op-code-caches-apc-eaccelerator-and-xcache-compared.html
This basically confirmed our previous analysis. We needed to start caching each level of the page return process if we were ever going to come close to meeting our requirements. After reading up on some docs, consulting some other large scale Drupal setups and looking into what modules we could use with Drupal for caching, we chose the following:
They would be our three layers of defense. Memcached would save the results of our database queries in memory, relieving some of the load on MySQL. APC would save the compiled PHP op-code in memory, saving Apache from having to re-read each PHP file and converting it into byte code on each page request. And finally, Squid would cache the resulting generated pages and spit back HTTP responses with those pages blazingly fast.
Both APC and Squid can exist on their own on the webserver without the need to configure your application to use them. Each have pretty well described configuration instructions on the web so I won't go over it here (if you'd like more details, leave a comment and I'll add some information about our configuration). But Drupal needs to be told to use Memcached. To do that, we used the Cache Router module. The configuration was surprisingly simple, but make sure you RTFM. You have to make changes to settings.php to get this to work properly. In the case of Memcached, you'd add something along these lines into your settings.php file:
$conf['cache_inc'] = './sites/all/modules/cacherouter/cacherouter.inc';
$conf['cacherouter'] = array(
'default' => array(
'engine' => 'memcache',
'server' => array('localhost:11211'),
'shared' => TRUE,
'prefix' => '',
'path' => 'sites/default/files/filecache',
'static' => FALSE,
'fast_cache' => TRUE,
),
);
This tells Drupal that for all default cache tables, use Memcached to store them in memory. You can actually use different caching tools for different tables. There's even some discussion in the forums on the project page to get primary and secondary caches built for Drupal. That way if one of your caches expires and needs to be rebuilt or the load gets too much for one tool, you have the other to take up the slack.
So with cache router installed, APC running we decided to start doing a few tests just to see where we were at. Squid can take a lot of time to configure properly so we wanted to see how much improvement we'd made thus far. We ran some tests on the site prior to any of the caching tools being installed to use as our baseline. We used jMeter, an Apache project, to do our tests. Upside to the tool is that since it's written in Java, you can run it just about anywhere. The downside to the tool is the fact that it's written in Java. We had lots of fun with out of memory issues with our tests and if you check online, we aren't the only ones to have memory problems with jMeter. But it's free and did the job reasonably well.
Just a note, because the load balancer wasn't ready when we started testing, we ran all the tests on only one of the blade servers. So in theory, as long as the load balancer isn't a bottleneck, the results should just about double with it in place.
Our baseline test gave us the following results:
Requests: 10,000
Avg Throughput/min: 9,792
Avg Response Time: 3491ms
For a first test, not bad. Total test time took about a minute. We reached 10,000 with just the horsepower of the blade server. What concerned us though was the response time. An average of 3.5 secs per request was going to cut it. So we took the next logical step. Turn on Drupal's built in Page Cache and see what happens. Page cache basically just saves the resulting generated page in the database and spits it out when needed.
Drupal Page Cache
Requests: 10,000
Avg Throughput/min: 20,435
Avg Response Time: 270ms
We were a little surprised at this result. We didn't think the built in cache would generate such drastic improvements. Doubled the throughput, half the time to complete the test and more than 10x faster response times. This was good news. Loading the pages from MySQL seemed to work pretty well. But if we have 7x the number of requests to the database, we might run out of available connections to the database. So to move forward, we turned on Memcached.
When we started to test Memcached, we noticed something. The tests were taking longer to be generated than it took the server to respond to each. Good news for the server's performance, bad news for our testing machine. It couldn't generate enough load on the Drupal server anymore. So we stepped things up. We grabbed one of the 8 core, 16GB monster web servers from the Solaris cluster to start testing Drupal. We figured it had enough horsepower to continue testing. We were right. We stepped up the number of requests to 40,000 and got the following results.
Memcached
Requests: 40,000
Avg Throughput/min: 31,768
Avg Response Time: 130ms
Think about this for a second, we quadrupled the amount of requests and we managed to get 130% more throughput per minute and halved the response time. And on top of that, we're getting closer to our 70,000 magic number. So next step was to test APC's affect on the numbers.
Now another special note, APC actually increased the speed enough that the one server couldn't generate tests fast enough once again. However, when we tried to increase the number of requests and add extra servers, we ran into more java memory problems. Not only could we no generate any more requests above 40,000 per server, the amount of data returned overloaded the jmeter analysis tool we were using. So the results below are probably the least reliable of all our tests, but it does show a trend. Things got faster again.
APC + Memcached
Requests: 40,000
Avg Throughput/min: 71,435
Avg Response Time: 30ms
We did it! We reached 70,000+ requests. We haven't even installed Squid yet and we reached our goal. On top of that, the Drupal server was responding so fast that the network it was running on couldn't send enough data at it to keep pushing it any further. At this point, we decided that adding Squid would probably be pointless. Not only can we not test anything beyond what we've already done, the network wouldn't be able to handle more tests.
That's where we left things. With just APC and Memcached we managed to match the requirements of the Solaris monster. And don't forget, we only used one of the two blade servers to do these tests. That means we've got room to spare and room to grow we met our goal. The lesson in all this, if there is one, is that you don't necessarily need to spend a ton of money on super powerful servers. If you take some time to think about things, tweak your configuration, you can get a lot more out of Drupal and your server than you might think.
Resources:


why not using ab
Why using jMeter and exposing yourself to all these memory issue when you can use the ab command line tool? (apache2-utils) I'd be interested to know how it would perform on your server. Did you run all your benchmark on the same network (benchmark monitor and benchmarked server)?
<a href="http://openspring.net/">scor</a>
Nice setup
We are currently working with a very somehow similar setup: replicated mysql server, a few webservers with apc and memcache using cacherouter and squid as reverse-proxy/cache keeping the users at bay.
Now my question is, how did you go about user sessions? I am currently having some weird login problems, like the user getting logged out right after login in, or getting logged out after editing something.
So... how do you deal with the 4 leves of caching? drupal+apc+memcache+squid?
We didn't run into any issues
--
Mathew Winstone
Director of Operations
What config of php for Apache did you use?
Great article, thanks for taking the time to write it and sharing it.
For running php in Apache, did you use plain mod_php or chose to go with mod_fcgid (which I gather render APC performance improvement kind of useless but does a better work at memory and resource handling)?
Thanks again,
Carlos Miranda Levy
www.socinfo.com
Plain ol' mod_php.
--
Mathew Winstone
Director of Operations
Impressive!
I am right now working on a new site and we're planning on doing something similar (without the huge "other application" in front)
I would really love to know how you dealt with your configs: apache, php, mysql... did you go innodb on some tables? what tables did you use apc for? and memcache?
awesome information. thanks
awesome information. thanks for taking the time to put this together.
steve rude (slantview)