mod_cband to the Rescue
The problem: A webserver with a lot of files that are to be public... and the public is downloading too much, too fast, too often... in what seems to be a malicious fashion... especially since everyone seems to be using multi-threaded download accelerators.
Read more for a better explanation of the problem and the steps needed to install
What is a multi-threaded download accelerator? If you or someone you know is a Microsoft Windows user, perhaps you have heard of FlashGet. For Linux, there is aget. Just what do these programs do? Let's imagine that you want to download one or more .iso images of the latest version of your favorite Linux distribution. You've found a website that has the images you are looking for. With a multi-threaded download accelerator you can download faster by making multiple (often 10-20), simultaneous connections to the server with each connection being a chunk of the file being downloaded.
Here's what it looks like from the webserver's perspective:
22.214.171.124 - - [24/Dec/2006:04:21:27 -0700] "GET /windows/eclipse-SDK-3.2.1-win32.zip HTTP/1.1" 206 91040039 "http://img.cs.montana.edu/windows/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
126.96.36.199 - - [24/Dec/2006:04:21:59 -0700] "GET /windows/eclipse-SDK-3.2.1-win32.zip HTTP/1.1" 206 90999383 "http://img.cs.montana.edu/windows/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
188.8.131.52 - - [24/Dec/2006:04:21:09 -0700] "GET /windows/eclipse-SDK-3.2.1-win32.zip HTTP/1.1" 206 3958262 "http://img.cs.montana.edu/windows/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
I've only shown three entries... but in reality, there are 10-20 simultaneous connections. Depending on how big the target file is and what size the client makes the chunks, there can be hundreds of log entries for a single downloaded file. If there are 10 hosts downloading, that translates to 100-200 connections with a separate httpd process servicing each of them. That is a lot of system resources and it is going to be a lot of bandwidth too.
Notice the 206 result code. If you see a lot of those in your log, you are getting hit by download accelerators. Here's what 206 means:
206 - Partial Content
A status code of 206 is a response to a request for part of a document. This is used by advanced caching tools, when a user agent requests only a small part of a page, and just that section is returned.
The machine I was having a problem with was really handling the load just fine. In fact, I didn't even notice anything out of the ordinary. I happen to run
monit on the server in question. Not to go off on too much of a tangent,
monit is a really handy application which can be configured to watch the desired server applications and take certain actions if certain thresholds are met. Every couple of days, I was getting emails from
monit telling me that it had restarted
apache because it had reached the configured max number of processes which I had set to 100.
All of those connections totaled up can be a significant chunk of bandwidth. That was the real problem... all of the bandwidth the machine was handing out continuously to the outside world. I setup the server to be an httpd based file server mostly for machines on the LAN so I could access all of the Linux .isos as well as run various distro repositories for doing software installs and updates. It was intended to save bandwidth since all of my lab machines would do everything over the LAN rather than having to download from the Internet. I also want to be able to access everything from off the LAN.
Sure, I could password protect everything but I wanted it to be a public resource... but now the public was really abusing the machine. Surely there must be a way to set limits with
apache to :
- Limit the number of simultaneous connections per client
- Limit the amount of bandwidth a client can consume
- Limit how much a client can download over a given period
- Limit the amount of bandwidth the server will commit to off LAN downloads
- Differentiate between clients on the LAN and those off such that limits do not apply (or a different set of limits apply)
apache 1.x series had
mod_throttle but it had since been abandoned and was not compatible with the
apache 2.x series. After a little bit of searching, I learned about
mod_cband. Here is a slightly modified description of
mod_cband taken from
mod_cbandis an Apache 2 module provided to solve the problem of limiting users' and virtualhosts' bandwidth usage. The current versions can set virtualhosts' and users' bandwidth quotas, maximal download speed (like in
mod_bandwidth), requests-per-second speed and the maximal number of simultanous IP connections (like in
mod_cbandis especially handy for use by hosting companies which would like to limit data transfer for their users, such as "10 Gb of traffic per month". There already was
mod_curbmodule, which can limit data transfers, but it doesn't work with virtualhosts and Apache 2.
What follows isn't intended to be a HOWTO, as it is not indepth enough, but it will serve as an introduction.
Download the LATEST source for
mod_cband. At the time of writing, it was 0.9.7.5.
Before you try to compile
mod_cband, make sure
httpd-devel is installed. I'm working with CentOS machines (some even inside of an OpenVZ VPS) and I didn't have
httpd-devel installed... nor did I have a compiler... so I just did the following:
yum install httpd-devel
yum did its job by figuring out all of the dependecies, downloading everything and installing it. In my particular instance, it translated into about 12 packages totalling approximately 15MB which took about 1-2 minutes.
mod-cband-0.9.7.5.tgz wherever you want (I like
cd into the directory where the source is and do the following:
The whole process only took about a minute. The
make install part even modified the
/etc/httpd/conf/httpd.conf such that
apache would load
mod_cband. Before I restart
apache though, I'll make a few more changes to its config.
mod_cband was installed, I removed
httpd-devel and all of the dependencies that were installed with it... as I don't like to have a compiler on a webserver... as that can be used by hackers to compile their own kits if they find a way to exploit
apache or some bad PHP code. The contents of
/var/log/yum.log comes in handy for reversing things. :)
mod_cband does everything from a
Virtual Host perspective... so if your
apache isn't running any virtual hosts at all, you'll need to configure at least one... which would act a the default site. Most of the configuration for
mod_cband is done within the
<VirtualHost></VirtualHost> container but there are some settings that are global and outside of the virtualhost stuff. Insert the following somewhere in your config:
Here's what my virtual host container for the server in question looks like:
CBandClassRemoteSpeed lan_class 1024kb/s 50 100
CBandRemoteSpeed 25kb/s 3 3
Please note that I make reference to
lan_class which is a global definition so it is contained outside of the virtual host container and looks like this:
You can define as many classes as you like if you have the desire. After a class is defined, you may refer to it in a virtual host container.
I'm not going to bother to explain what each
mod_cband configuration directive means. That is what the documentation is for... and it does a much better job than I could. To summarize,
lan_class is a mask for all of the IP addresses for all of the machines on our LAN. The
CBandRemoteSpeed specify limits for the max bandwidth, number of requests per second, and number of connections.
Remember, whenever you change the
apache config, restart
httpd for the change to take affect. The <Location> section specifics a URL that can be used to get to the
mod_cband status page which you can browse to to see all kinds of status and running totals.
Here's what it looks like:
You should be able to get a glimpse of some of the stuff you can do with
mod_cband as well as imagine what monitoring features it gives you with the status page.
mod_cband did everything I wanted it to do, was easy to install and configure and just works. So far as features go, I've only scratched the surface with my needs... as it offers a ton of other features that are well suited to hosting companies who want to put limits on the bandwidth their virtual hosts use. If you have the need or desire, give it a try and let us know how your experience went.