The Next Apache
APACHE TEAM MEMBERS TALK ABOUT THE BIGGEST UPGRADE TO THE BIGGEST PROGRAM ON THE WEB
By Jeffrey Carl
The Apache webserver is (along with Linux and Perl) probably the more widely-used open-source software in the world. After beginning as a set of patches to the NCSA http server (“a patchy server” was how it got its name), Apache had moved by 1996 to become the most popular http server out there. According to Netcraft, Apache today powers 59 percent of all the webservers on the Internet, far more than the 20 percent share of the runner-up, Microsoft Internet Information Server.
Apache’s last “full-point” release (Apache 1.0) was released on December 1, 1995, and it has been five years since then. Naturally, there’s a lot of excitement about the long-awaited Apache 2.0, which should be in beta release by the time you read this. To find out what’s new with Apache 2, I asked the Apache Project’s Dirk-Willem van Gulik and Ryan Bloom. The following is selected portions of an e-mail interview with these Apache team members:
Boardwatch: Why was the Apache 2.0 project started? What shortcomings in Apache 1.x was it created to address, or what missing features was it designed to implement?
Bloom: There were a couple of reasons for Apache 2.0. The first was that a pre-forking server just doesn't scale on some platforms. One of the biggest culprits was AIX, and since IBM had just made a commitment to using and delivering Apache, it was important to them that we get threads into Apache. Since forcing threads into 1.3 would be a very complex job, starting 2.0 made more sense. Another problem was getting Apache to run cleanly on non-Unix platforms. Apache has worked on OS/2 and Windows for a long time, but as we added more platforms (Netware, Mac OS X, BeOS) the code became harder and harder to maintain.
Apache 2.0 was written with an eye towards portability, using the new Apache Portable Run-time (APR), so adding new platforms is simple. Also, by using APR we are able to improve performance on non-Unix platforms, because we are using native function calls on all platforms. Finally, in Apache 2.0, we have implemented Filtered I/O. This is a feature that module writers have been requesting for years. It basically allows one module to modify the data from another module. This allows CGI responses to be parsed for PHP or SSI tags. It also allows the proxy module to filter data.
Boardwatch: What are the significant new features of Apache 2.0?
Bloom: Threading, APR, Filtered I/O. :-) And Multi-Processing modules and Protocol modules. These are two new module types that allow module writers more flexibility. A Multi-Processing module basically defines how the server is started, and how it maps requests onto threads and processes. This is an abstraction between Apache's execution profile and the platform it is running on. Different platforms have different needs, and the MPM interface allows porters to define the best setup for their platform. For example, Windows uses two processes. The first monitors the second. The second serves pages. This is done by a Multi-Processing Module.
Protocol modules are modules that allow Apache to serve more than just HTTP requests. In this respect, Apache can act like inetd on steroids. Basically, each thread in each process can handle either HTTP or FTP or BXXP or WAP requests at any time, as long as those protocol modules are in the server. This means no forking a new process just to handle a new request type. If 90% of your site is served by HTTP, and 10% is served by WAP, then the server automatically adjusts to accommodate that. As the site migrates to WAP instead of HTTP, the server continues to serve whatever is requested. There is no extra setup involved. I should mention now that only an HTTP protocol module has been written, although there is talk of adding others.
Boardwatch: How much of a break with the past is Apache 2.0, in terms of 1.) the existing code base, 2.) the administration interface, and 3.) the API for modules?
Bloom: I'll answer this in three parts.
1) The protocol handling itself is mostly the same. How the server starts and stops, generates data, sends data, and does anything else is completely different. By adding threads to the mix, we realized that we needed a new abstraction to allow different platforms to start-up differently and to map a request to a thread or process the best way for that platform.
2) The administrative interface is the same. We still use a text file, and the language hasn't changed at all. We have added some directives to the config file, but if somebody wants to use the prefork MPM (it acts just like 1.3), then a 1.3 config file will migrate seamlessly into 2.0. If the MPM is changed, then the config file will need slight modifications to work. Also, some of the old directives no longer do what they used to. The definition of the directive is the same, but the way the code works is completely different, so they don't always map. For example, SetHandler isn't as important as it once was, but the Filter directives take its place.
3) The module API has changed a lot. Because we now rely on APR for most of the low-level routines, like file I/O and network I/O, the module doesn't have as much flexibility when dealing with the OS. However, in exchange, the module has greater portability. Also the module structure that is at the bottom of every module has shrunk to about 5 functions. The others are registered with function calls. This allows the Apache group to add new hooks without breaking existing modules. Also, with the filter API modules can should take more care when generating data to generate it in size-able chunks.
Boardwatch: What are the advantages and disadvantages (if any) of Apache 2.0's multithreaded style? What does it mean to have the option of being multi-process and multi-threaded?
Bloom: The multi-threading gives us greater scalability. I have actually seen an AIX box go from being saturated at 500 connections to being saturated at more than 1000. As for disadvantages, you loose some robustness with this model. If a module segfaults in 1.3, you lose one connection, the connection currently running on that process. If a module segfaults in 2.0, you lose N connections, depending on how many threads are running in that process, which MPM you have chosen, etc. However, we have different MPMs distributed with the server, so a site that only cares about robustness can still use the 1.3 pre-forking model. A site that doesn't care about robustness or only has VERY trusted code can run a server that has more threads in it.
van Gulik: In other words; Apache 2.0 allows the webmaster to make his or her own tradeoffs; between scalability, stability and speed. This opens a whole new world of Quality of Service (QoS) management. Another advantage of these flexible process management models is that integration with languages like Perl, PHP, and in particular Java, can be made more cleanly and more robust without loosing much performance. Especially large e-commerce integration projects will be significantly easier.
Boardwatch: What is the Apache Portable Runtime? What effect does this have on the code, and on portability?
Bloom: The Apache Portable Runtime is exactly what it says. :-) It is a library of routines that Apache is using to make itself more portable. This makes the code much more portable and shrinks the code size, making the code easier to maintain. My favorite example is apachebench, a simple benchmarking tool distributed with the server. AB has never worked on anything other than Unix. We ported it to APR, and it works on Unix, Windows, BeOS, OS/2, etc without any work. As more platforms are ported to APR, AB will just work on them, as will Apache. This also improves our performance on non-POSIX platforms. Apache on Windows, the last I checked, is running as fast as Apache on Linux.
Boardwatch: Can Apache 1.3.x modules be used with Apache 2.0? How will this affect things like Apache-PHP or Apache-FrontPage?
Bloom: Unfortunately, no. However, they are very easy to port. I have personally ported many complex modules in under an hour. The PHP team is already working on a port of PHP to 2.0, as is mod_perl. Mod_perl has support for some of the more interesting features already, such as writing filters in Perl, and writing protocol modules in Perl. FrontPage will hopefully be subsumed by DAV, which is now distributed with Apache 2.0.
van Gulik: Plus it is likely that various Apache-focused companies; such as IBM, Covalent and C2/RedHat will assist customers with the transition with specific tools and migration products
Boardwatch: Do you predict that server administrators used to Apache 1.3.x will have a hard time adjusting to anything about Apache 2.0? If so, what?
Bloom: I think many admins will move slowly to Apache 2.0. Apache 2.0 has a lot of features that people have been asking for for a very long time. The threading issues will take some getting used to, however, and I suspect that will keep some people on 1.3 for a little while. Let's be honest, there are still people running Apache 1.2 and even 0.8, so nobody things that every machine running Apache is suddenly going to migrate to 2.0. Apache tends to do what people need it to do. 2.0 just allows it to do more.
Boardwatch: Who should/shouldn't use the alpha or beta releases?
Bloom: The alpha releases are developer releases, so if you aren't comfortable patching code and fixing bugs, you should probably avoid the alphas. The betas should be stable enough to leave running as a production server, but there will still be issues so only people comfortable with debugging problems and helping to fix them should really be using the betas. (Napster is using alpha 6 to run their web site)
Boardwatch: For server administrators, what guidelines can you give about who should or shouldn't upgrade to Apache 2.0 when it becomes a final release? For what reasons?
Bloom: Personally, I think EVERYBODY should upgrade to Apache 2.0. :-) Apache 2.0 has a lot of new features that are going to become very important once it is released. However, administrators need to take it slowly, and become comfortable with Apache 2.0. Anybody who is not on a Unix platform should definitely upgrade immediately. Apache has made huge strides in portability with 2.0, and this shows on non-Unix machines.
van Gulik: I'd concur with Ryan insofar as the non-Unix platforms are concerned; even an early beta might give your site a major boost; as for the established sites; running on well understood platforms such as Sun Solaris and the BSD's - I am not so sure if they will upgrade quickly or see the need; The forking model has proven to be very robust and scalable. Those folks will need features; such as the filter chains to be tempted to migrate.
Boardwatch: Apache is the most popular web server out there, meeting the needs of many thousands of webmasters. What else is there to do? What is on the Apache web server team's "wish list" for the future?
Bloom: Oh, what isn't on it? I think for the most part, we are focusing on 2.0 right now, with the list of features that I have mentioned above. We are very interested in the effect of internationalization on the web and Apache in particular. There are people who want to see an async I/O implementation of Apache. I think that we will see some of the Apache group's focus move from HTTP to other protocols that compliment it, such as WAP and FTP. And I think finally we want to continue to develop good stable software that does what is needed. I do think that we are close to hitting a point where there isn't anything left to add to Apache that can't be added with modules. When that happens, Apache will just become a framework to hang small modules off of and it will quiet down and not be released very often.
Boardwatch: Anything else you'd like to add? ;)
Bloom: Just that Apache 2.0 is coming very close to its first beta, and hopefully not long after that we will see an actual release. The Apache Group has worked long and hard on this project, and we all hope that people find our work useful and Apache continues to be successful.
Want more of this article? Read the full interview at www.ispworld.com.