WebDevelopersJournal.comTips on Web Page Design, HTML and Graphics
SITE SEARCH
Newsletters
Java/Open Source Update



Jobs at webdeveloper.com

Resources By Subject
Technical
Graphical
Authoring
Business
WDJ resources
Archive

internet.com

internet.commerce


Developer Channel


Find a web host with:
CGI Access DB Support Telnet Access
NT Servers UNIX Servers



Semi-automatic?

JavaScript
JavaScript Helper:
Meet Paige Turner, the least geeky geek we've ever come across.

Variables and Operators Explained:
First of a three part guide to JavaScript basics.

Controlling Forms:
Enhance your HTML forms with a touch of JS.

DHTML:
Forget how it works, let's see some in action!


Speeding Up Your Web Server CGI

by Dave Cartwright

Supercharging Your Back End

Relatively few Web servers are dedicated to static HTML files. Most organisations use custom programming to add functionality to their Web sites. While some companies have just a few extras such as a guestbook or register-your-interest form, others rely completely on custom scripts to build pages dynamically, completely replacing static content with pages built on the fly.
March 7, 2000

For simple installations with modest usage levels, the efficiency of scripts is of little interest - it doesn't matter if a program takes half a megabyte of RAM for a second or two. But where script use is heavy, the resources used by scripts and the time taken for them to execute are critical to the success of the site. Here, we start at the bottom - CGI - and look at ways of making custom applications run faster and more efficiently.

The Common Gateway Interface

CGI is the most common scripting system by far, for two reasons: it is the original, standard way to add back-end functionality to a Web server; and, more importantly, it is absolutely trivial to use.

Of course, all concepts in computing are compromises, and so the cost of this simplicity is performance - CGI simply doesn't run efficiently, for a number of reasons. First, each time a script runs, the server has to create a process, load the code into the process's memory segment, execute the code, then destroy the process and reallocate its associated memory. Think how long it takes to load a normal application program into memory on your desktop machine. While CGI scripts are generally much smaller, the load/unload time will still be significant. More importantly, though, since CGI scripts are only intended to run for a second or two, the load/unload phase represents a large percentage of their total execution time.

The second issue is the way in which user data finds its way into the script (you probably wouldn't be using a script unless you wanted to act on some kind of user data, either from a form or via a hidden field or cookie used to track users around the site).

To service a GET request requires three environment variable lookups. To service a POST requires two environment lookups and a multi-character 'read' operation on the standard input stream. As with the process startup, this all takes time.

Finally, one must not forget the nature of the program itself. The first aspect here is the choice between an interpreted language (e.g. Perl) and a compiled language (C or C++); interpreted languages are always slower than compiled ones, as the high-level language has to be translated into machine code by the interpreter during each run (compiled programs are in machine code already). The second aspect of the nature of the program, though, is the skill of the programmer (and the compiler) - many programs are thrown together with little thought for their functionality.

Step 1 - Sensible code

Take the following two code segments.

void myFunc(char *s)
{
int x ;
x = 0 ;
while (x < strlen(s))
{
printf("%c",s[x]) ;
x++ ;
}
printf("\n") ;
}

void myFunc(char *s)
{
int x = 0, z ;
z = strlen(s) ;
while (x < z)
{
printf("%c",s[x++]) ;
}
printf("\n") ;
}

Both code segments do the same thing - print the contents of string 's' character by character. But the second is more efficient than the first - it initialises 'x' with the value 0 inside the variable declaration, and avoids the extra line inside the loop by using a postincrement ('x++') which probably saves a few CPU cycles. It also uses a variable ('z') to hold the length of string 's' rather than calling the 'strlen' function on each iteration of the loop.

Step 2 - Turn on the optimisation features

Of course, while it is always sensible to try to optimise your code when you write it, a decent compiler would spot that the 'strlen' result in the previous example was likely to be used several times and optimise it appropriately. It is a sad fact that many people don't know how to turn on code optimisation in their compilers or even that such features exist. Adding three characters to the command line of the Unix C compiler 'gcc' can improve the eventual performance of the code by several per cent.

You can also make decisions about whether you want to offset performance against code size - on some systems you can compile 'static' library functions into your code rather than letting them make relatively slow calls to dynamic libraries at run time - the compromise here is that you gain speed but increase the size of the executable, though if RAM is plentiful it may be an attractive option.

Step 3 - No interpreters, thanks

Writing efficient code and turning on the optimiser is a habit you should undoubtedly get into, but if you are using an interpreted language such as Perl, there is much further to go.

Compiled languages such as C produce CGI scripts which are executable chunks of binary code that can be run directly on the computer. Interpreted programs are chunks of high level language which are converted to machine code on the fly as the programs are run. Not only do you have the time overhead of performing this code conversion, but you also have the memory requirement not just for the finished machine code but for the source high-level code AND the interpreter program itself. This is why servers based on Perl can hit high load factors extremely quickly - not only is the memory full with copies of source code, object code and interpreter code, but the CPU is busy executing not only the CGI code itself but also the interpreter's instructions.

The answer, one could argue, is not to use interpreted languages such as Perl; this is a non-starter, though, because Perl's beauty is that it lends itself superbly to the string processing tasks that are the most common actions in CGI - the fact remains that it's generally more long-winded to write a given CGI script in C or C++ than in Perl.

Step 4 - Make the interpreter faster

In its most basic form, a Perl-based CGI script is far worse than a C-based one. As well as the startup and shutdown time for the script itself, one also has the startup and shutdown time for the Perl interpreter. Some Web server packages can sidestep this latter step by linking the interpreter tightly into the Web server itself. The most commonly used example is the Perl module for the Apache Web server, also known as mod_perl.

Instead of being loaded each time a URL pointing to a CGI script is requested, the Perl module is loaded when the Web server program starts up, and remains resident in memory. When a program needs to be run, the module is already there ready to accept the code, and the time between the call and the execution is much shorter.

Not only this, but mod_perl will transform the Perl source code into machine language and then store it for later, further reducing the execution time for the script. An added advantage is that the Perl module will occupy less memory than the traditional interpreter, so you can have more concurrent processes before running out of physical RAM and seeing the machine start to use its swapfiles.

Step 5 - Write for the API yourself

The Perl module is faster than using the standard Perl interpreter via CGI because it uses the Application Programmer Interface (API) of the server to tie it more tightly into the server code itself - that is, to use the API is to make the code (a) resident at server startup instead of loaded on the fly; and (b) communicate with the server program more efficiently than the rather clunky environment variable exchange that CGI uses.

But why not cut out the final obvious component of the execution chain - the Perl module itself? Instead of letting the CGI program communicate with the Perl module, and the Perl module with the Web server API, why not simply write your own script so that it communicates directly with the server API? This cuts down the amount of interaction, the memory footprint and hence the overall speed of execution.

This is a perfectly reasonable desire, but there is one downside to direct API programming - namely that it is considerably harder than using the extremely basic CGI approach. APIs, though similar, are generally not compatible between brands of server, and the variety of system calls and library hooks is far greater than CGI with its simple use of environment variables and the 'standard' input stream. This said, though, the API for a given platform is usually well documented and if you really want to make your customised scripts fly, you can most definitely use it to your advantage.

Step 6 - Reduce the load on the server

Languages such as Java and JavaScript, combined with developments in the underlying Web technology (the move of imagemaps from the server to the client, for instance) bring with them the opportunity to improve the server's efficiency simply by reducing the load and exploiting the client machine's inevitable surplus of processor power.

Running a client-side imagemap instead of a server-side one gives a slightly longer download time (for the map's clickable area descriptions) but eliminates the need for the server to do any work processing the user's click. Imagine you have a form with compulsory fields and input items whose format is crucial. It is far more efficient for simple JavaScript to handle "You didn't fill in X" or "The date should be entered as dd/mm/yyyy" at the client end than for the server to be given the onerous task of basic form validation (perhaps several times per user if multiple entries are missed or incorrect). The more processing that can be done at the client, the less work the server has to do and the more resources become available for 'real' server work - all at the cost of downloading a few dozen bytes of JavaScript.

The compromise with raw speed

Everything discussed so far does indeed make the back end of your Web site faster. However, if this was the sole requirement for Web servers everyone would be an Apache API guru. The fact remains that this is simply not the case, and the reasons are relatively obvious:

  • Time differences: while a program written in C++, run through an optimising compiler and hooked directly into the server API will certainly be screamingly fast, a Perl program run using mod_perl will not be very much slower.
  • Development time: the C++ program mentioned in the previous example may well take twice or three times as long to write as the Perl program. Not only this, but Perl is far better suited than C++ to the largely text-crunching nature of CGI.
  • Interaction: many scripts need to interact with other parts of the system, or even other parts of the network - database servers are the most common examples here. The time taken for this interaction between machines will be far greater than any time savings made by programming the script to death.

This is why systems not mentioned so far such as Active Server Pages (ASP) and Java servlets are so popular. The languages are (particularly ASP) not hard to learn, but they are designed with Internet operation in mind. JDBC is there to provide database interaction for Java. ASP exploits ODBC to provide links with back-end databases such as Oracle or SQL Server. And with the current generation of Web-centric implementations from the likes of Oracle, CA (Ingres) and Informix, the database and the Web server engine are pretty much one and the same anyway.

As with any computing problem, making your server back-end fly is a compromise of server speed, development speed and skills requirements. If you are a Perl person, for instance, you'd be dumb not to accelerate execution and reduce memory usage by using mod_perl, but it wouldn't be worth learning C or C++ just to spend hours milking microseconds out of the server API.

Likewise, if you are an ASP user, you would benefit little from moving to Perl or C for the sake of gaining a little speed as it would then be far harder to interact with back-end databases - though you could potentially save time by addressing network bottlenecks. Choose your scripting system sensibly, develop sensible code, and use whatever optimisation you can, but not at the mindless expense of usability and features.



Dave Cartwright's first proper job was running NetWare 2.0a servers for a defence contractor and fighting with a digital phone switch (one of the first of its kind). Having graduated with a boringly technical degree in theoretical computer science, he became a Unix systems and network manager at UEA, Norwich, UK. While there, the Internet came to UK academia and later Mr. Berners-Lee came up with this Web thing (an excellent excuse to 'research new technology' rather than doing boring support stuff). Before disappearing into journalism in '95 (as technical editor of Network Week) Dave did a lot of work back-ending Web servers with databases. Having earned an easy living for a couple of years as a techie writer, he then went back into the real world as IT and Telecomms Manager at CMP UK. He's now Chief Technology Officer at Vavo.com.
Suits PonytailsPropheadsContact WDJDiscussWeb AudioSearch