The Next Apache
APACHE TEAM MEMBERS TALK ABOUT THE BIGGEST
UPGRADE TO THE BIGGEST PROGRAM ON THE WEB
By
Jeffrey Carl
The
Apache webserver is (along with Linux and Perl) probably the more widely-used
open-source software in the world. After beginning as a set of patches to the
NCSA http server (“a patchy server” was how it got its name), Apache had moved
by 1996 to become the most popular http server out there. According to
Netcraft, Apache today powers 59 percent of all the webservers on the Internet,
far more than the 20 percent share of the runner-up, Microsoft Internet
Information Server.
Apache’s
last “full-point” release (Apache 1.0) was released on December 1, 1995, and it
has been five years since then. Naturally, there’s a lot of excitement about the
long-awaited Apache 2.0, which should be in beta release by the time you read
this. To find out what’s new with Apache 2, I asked the Apache Project’s Dirk-Willem
van Gulik and Ryan Bloom. The following is selected portions of an e-mail
interview with these Apache team members:
Boardwatch: Why was the Apache 2.0 project started?
What shortcomings in Apache 1.x was it created to address, or what missing
features was it designed to implement?
Bloom:
There were a couple of
reasons for Apache 2.0. The first was that a pre-forking server just doesn't
scale on some platforms. One of the biggest culprits was AIX, and since IBM had
just made a commitment to using and delivering Apache, it was important to them
that we get threads into Apache. Since forcing threads into 1.3 would be a very
complex job, starting 2.0 made more sense. Another problem was getting Apache
to run cleanly on non-Unix platforms. Apache has worked on OS/2 and Windows for
a long time, but as we added more platforms (Netware, Mac OS X, BeOS) the code
became harder and harder to maintain.
Apache
2.0 was written with an eye towards portability, using the new Apache Portable
Run-time (APR), so adding new platforms is simple. Also, by using APR we are
able to improve performance on non-Unix platforms, because we are using native
function calls on all platforms. Finally, in Apache 2.0, we have implemented
Filtered I/O. This is a feature that module writers have been requesting for
years. It basically allows one module to modify the data from another module.
This allows CGI responses to be parsed for PHP or SSI tags. It also allows the
proxy module to filter data.
Boardwatch: What are the significant new features of
Apache 2.0?
Bloom: Threading, APR, Filtered I/O. :-) And
Multi-Processing modules and Protocol modules. These are two new module types
that allow module writers more flexibility. A Multi-Processing module basically
defines how the server is started, and how it maps requests onto threads and
processes. This is an abstraction between Apache's execution profile and the
platform it is running on. Different platforms have different needs, and the
MPM interface allows porters to define the best setup for their platform. For
example, Windows uses two processes. The first monitors the second. The second
serves pages. This is done by a Multi-Processing Module.
Protocol
modules are modules that allow Apache to serve more than just HTTP requests. In
this respect, Apache can act like inetd on steroids. Basically, each thread in
each process can handle either HTTP or FTP or BXXP or WAP requests at any time,
as long as those protocol modules are in the server. This means no forking a
new process just to handle a new request type. If 90% of your site is served by
HTTP, and 10% is served by WAP, then the server automatically adjusts to
accommodate that. As the site migrates to WAP instead of HTTP, the server
continues to serve whatever is requested. There is no extra setup involved. I
should mention now that only an HTTP protocol module has been written, although
there is talk of adding others.
Boardwatch: How much of a break with the past is
Apache 2.0, in terms of 1.) the existing code base, 2.) the administration
interface, and 3.) the API for modules?
Bloom: I'll answer this in three parts.
1)
The protocol handling itself is mostly the same. How the server starts and
stops, generates data, sends data, and does anything else is completely
different. By adding threads to the mix, we realized that we needed a new
abstraction to allow different platforms to start-up differently and to map a
request to a thread or process the best way for that platform.
2)
The administrative interface is the same. We still use a text file, and the
language hasn't changed at all. We have added some directives to the config
file, but if somebody wants to use the prefork MPM (it acts just like 1.3),
then a 1.3 config file will migrate seamlessly into 2.0. If the MPM is changed,
then the config file will need slight modifications to work. Also, some of the
old directives no longer do what they used to. The definition of the directive
is the same, but the way the code works is completely different, so they don't
always map. For example, SetHandler isn't as important as it once was, but the
Filter directives take its place.
3)
The module API has changed a lot. Because we now rely on APR for most of the
low-level routines, like file I/O and network I/O, the module doesn't have as
much flexibility when dealing with the OS. However, in exchange, the module has
greater portability. Also the module structure that is at the bottom of every
module has shrunk to about 5 functions. The others are registered with function
calls. This allows the Apache group to add new hooks without breaking existing
modules. Also, with the filter API modules can should take more care when
generating data to generate it in size-able chunks.
Boardwatch: What are the advantages and
disadvantages (if any) of Apache 2.0's multithreaded style? What does it mean
to have the option of being multi-process and multi-threaded?
Bloom: The multi-threading gives us greater
scalability. I have actually seen an AIX box go from being saturated at 500
connections to being saturated at more than 1000. As for disadvantages, you
loose some robustness with this model. If a module segfaults in 1.3, you lose
one connection, the connection currently running on that process. If a module
segfaults in 2.0, you lose N
connections, depending on how many threads are running in that process, which
MPM you have chosen, etc. However, we have different MPMs distributed with the
server, so a site that only cares about robustness can still use the 1.3
pre-forking model. A site that doesn't care about robustness or only has VERY
trusted code can run a server that has more threads in it.
van
Gulik: In other words;
Apache 2.0 allows the webmaster to make his or her own tradeoffs; between
scalability, stability and speed. This opens a whole new world of Quality of
Service (QoS) management. Another advantage of these flexible process
management models is that integration with languages like Perl, PHP, and in
particular Java, can be made more cleanly and more robust without loosing much
performance. Especially large e-commerce integration projects will be
significantly easier.
Boardwatch: What is the Apache Portable Runtime?
What effect does this have on the code, and on portability?
Bloom:
The Apache Portable
Runtime is exactly what it says. :-) It is a library of routines that Apache is
using to make itself more portable. This makes the code much more portable and
shrinks the code size, making the code easier to maintain. My favorite example is apachebench, a
simple benchmarking tool distributed with the server. AB has never worked on
anything other than Unix. We ported it to APR, and it works on Unix, Windows,
BeOS, OS/2, etc without any work. As more platforms are ported to APR, AB will
just work on them, as will Apache. This also improves our performance on
non-POSIX platforms. Apache on Windows, the last I checked, is running as fast
as Apache on Linux.
Boardwatch: Can Apache 1.3.x modules be used with
Apache 2.0? How will this affect things like Apache-PHP or Apache-FrontPage?
Bloom: Unfortunately, no. However, they are
very easy to port. I have personally ported many complex modules in under an
hour. The PHP team is already working on a port of PHP to 2.0, as is mod_perl.
Mod_perl has support for some of the more interesting features already, such as
writing filters in Perl, and writing protocol modules in Perl. FrontPage will
hopefully be subsumed by DAV, which is now distributed with Apache 2.0.
van
Gulik: Plus it is likely
that various Apache-focused companies; such as IBM, Covalent and C2/RedHat will
assist customers with the transition with specific tools and migration products
Boardwatch: Do you predict that server
administrators used to Apache 1.3.x will have a hard time adjusting to anything
about Apache 2.0? If so, what?
Bloom:
I think many admins will
move slowly to Apache 2.0. Apache 2.0 has a lot of features that people have
been asking for for a very long time. The threading issues will take some
getting used to, however, and I suspect that will keep some people on 1.3 for a
little while. Let's be honest, there are still people running Apache 1.2 and
even 0.8, so nobody things that every machine running Apache is suddenly going
to migrate to 2.0. Apache tends to do what people need it to do. 2.0 just
allows it to do more.
Boardwatch: Who should/shouldn't use the alpha or
beta releases?
Bloom:
The alpha releases are
developer releases, so if you aren't comfortable patching code and fixing bugs,
you should probably avoid the alphas. The betas should be stable enough to
leave running as a production server, but there will still be issues so only
people comfortable with debugging problems and helping to fix them should
really be using the betas. (Napster is using alpha 6 to run their web site)
Boardwatch: For server administrators, what
guidelines can you give about who should or shouldn't upgrade to Apache 2.0
when it becomes a final release? For what reasons?
Bloom: Personally, I think EVERYBODY should
upgrade to Apache 2.0. :-) Apache 2.0 has a lot of new features that are going
to become very important once it is released. However, administrators need to
take it slowly, and become comfortable with Apache 2.0. Anybody who is not on a
Unix platform should definitely upgrade immediately. Apache has made huge
strides in portability with 2.0, and this shows on non-Unix machines.
van
Gulik: I'd concur with
Ryan insofar as the non-Unix platforms are concerned; even an early beta might
give your site a major boost; as for the established sites; running on well
understood platforms such as Sun Solaris and the BSD's - I am not so sure if
they will upgrade quickly or see the need; The forking model has proven to be
very robust and scalable. Those folks will need features; such as the filter
chains to be tempted to migrate.
Boardwatch: Apache is the most popular web server
out there, meeting the needs of many thousands of webmasters. What else is
there to do? What is on the Apache web server team's "wish list" for
the future?
Bloom: Oh, what isn't on it? I think for the
most part, we are focusing on 2.0 right now, with the list of features that I
have mentioned above. We are very interested in the effect of
internationalization on the web and Apache in particular. There are people who
want to see an async I/O implementation of Apache. I think that we will see
some of the Apache group's focus move from HTTP to other protocols that
compliment it, such as WAP and FTP. And I think finally we want to continue to
develop good stable software that does what is needed. I do think that we are
close to hitting a point where there isn't anything left to add to Apache that
can't be added with modules. When that happens, Apache will just become a
framework to hang small modules off of and it will quiet down and not be
released very often.
Boardwatch: Anything else you'd like to add? ;)
Bloom: Just that Apache 2.0 is coming very
close to its first beta, and hopefully not long after that we will see an
actual release. The Apache Group has worked long and hard on this project, and
we all hope that people find our work useful and Apache continues to be
successful.
Want
more of this article? Read the full interview at www.ispworld.com.