HTTP, the Hyper Text Transfer Protocol, is the definition of how web
browsers, web servers, and proxy caches communicate with one another.
This isn't the whole picture, since HTTP itself uses TCP/IP to do its
work, but learning about it is a good first step to the in depth
technical knowledge needed by a good webmaster or web programmer.
In the Illustrated Guide author Hethmon describes the HTTP protocol
from the point of view of a programmer writing a rudimentary web
server, using C++ code examples as his illustrations. This code, which
is available on the accompanying CD, builds a working HTTP server for
each version of the HTTP protocol through 1.1. Knowing C++ or at least
C will help you fully understand the workings of HTTP by reading this
book, but if you aren't a programmer you'll still be able to learn a
great deal. The book contains strong explanations of how the protocol
works from a programming independent viewpoint.
A strong point of Hethmon's approach to the subject is platform
independence. The code works with both Windows NT and OS/2, and his
explanations show a decent understanding of how things work under
UNIX, as well. Another strong point is that the book covers HTTP 1.1
as well as versions 1.0 and 0.9. When I saw "covers HTTP 1.1" on the
cover I fully expected a small chapter tacked on to the end, but the
new version is integral to the book and the code. Although I was
pretty familiar with HTTP 1.0 already, the thorough coverage of all
three versions of the protocol will keep this book within arm's reach
of my desk as a reference.
The organization of the book is OK, but not great - it might be
difficult for someone not already familiar with HTTP to grasp things
with a straight read-through. Hethmon starts out giving an overview
of HTTP, covering the history of the protocol from 0.9 to 1.1. He
then covers how an HTTP request works, the entity and headers, then
the HTTP response. In each of these sections the C++ implementation is
given and well explained at the end of the chapter. After covering the
protocol, the book outlines socket programming and then has a full
chapter explaining how the whole server is put together, ending with
the implementation of CGI support.
The workings of proxies and other caching, and its effects on the
operation of HTTP transactions is explained very well and in good
detail, another good reason for keeping this book handy.
My only major quibble with the book is that its focus on the
implementation of a web server, although invaluable for gaining a
strong understanding of how things work, means you may have to do
some work to figure out the impact of HTTP on other things, like CGI
programming. A specific example is the changes in the way HTTP 1.1
handles content length.
With HTTP 1.0, it's not necessary to specify the length of the data
your CGI program sends to the browser, but Hethmon suggests that
version 1.1 requires a correct content-length field, since the new
version supports persistent connections where multiple documents are
sent one after the other. What isn't clear from the book, or from
the RFC defining HTTP 1.1, is whether this means CGI programs will
have to be re-written to send the correct content length up front.
HTTP 1.1 does provide for chunked encoding, allowing you to do this
length calculation piece by piece. But this still looks like a hell
of a lot of work for CGI programmers, and would require CGI programs
to be rewritten to work on an HTTP 1.1 server. This may not be the
case - CGI programs may be able to get away without sending a length -
but the book doesn't address the question.
This just demonstrates that the coverage of HTTP in the Illustrated
Guide is limited in viewpoint. However, the full protocol is well
documented, the code for implementing the server is complete and
reasonably platform independent, and the coverage of proxies and
caching is valuable. This book probably won't be the only book on
HTTP 1.1 you'll need, but it's a handy one to have around, and a
good starting point for learning.