How does Apache spool STDOUT from a CGI script?

Question

As part of a content management system I'm developing I have a script which retrieves image files (JPEG, GIF, PNG etc) in response to the browser GETing a URL like http://myserver/getimage.cgi/virtual/path/to/image. On the server the image files are stored outside DOCUMENT_ROOT as randomly-named blobs and a database keeps track of the metadata, in particular the correspondence between virtual path, blob filename and the MIME type. The script looks like this:-

#!/usr/bin/perl

use CGI::Simple;
use File::Copy;
use MYSTUFF 'dblookup';

my $q = new CGI::Simple;
my ($mimetype, $filepath) = dblookup($q->path_info);

$| = 1; # enable autoflush so header is output before calling copy()
print $q->header(-type=>$mimetype);
copy($filepath, \*STDOUT);

The dblookup function is exported by MYSTUFF.pm and extracts the mimetype and filepath for the virtual path passed in via the standard CGI $PATH_INFO environment variable.

My concern is how the output of the CGI script is spooled by the Apache server before it starts sending it back to the browser. If it spools the entire output then potentially there's a need for a huge amount of spool space on the server because the image files can be 10's or 100's of MB, and when I start supporting video files they could get up to GB.

Is the Apache server sensible enough to spool the STDOUT stream from the CGI script only up to the point where it's got all the headers (i.e. the first "\n\n", which is generated by $q->header()), then start copying the data buffer-for-buffer from STDOUT to whatever socket is attached to the HTTP connection back to the browser? The documentation for File::Copy suggests that it will use a 1K buffer, so if Apache behaves in the way I've outlined then I don't really have a problem, do I, as the IPC sockets/pipes will enforce flow control on my CGI script, removing the need for any further spool space beyond the buffers that are already there??

Please don't downvote this question simply because you don't like the technology I'm asking about. I know it's old. I know there are sexier new kids on the block. I also know that my code already exists and that it will cost time and money to change, so I need to know how bad this old stuff is before I rip it up and start again. — kbro, Feb 14 '17 at 12:40

score 0 · Answer 1 · answered Jan 08 '17 at 11:27

0

I'm not sure how Apache handles this. However, if you're asking about this for performance reasons, then I wonder why you're using CGI to begin with...

A few alternatives you might want to consider:

Place the files in your document root, use mod_negotiate to pick the right MIME type based on server- and client side configuration
Use mod_rewrite to tell Apache to read a particular file in response to a particular request. It is possible to vary the produced result here based on, e.g., cookies, or other things set by the request; for more details, see the mod_rewrite documentation
If your script does more than just reading a file from disk and neither of the two above options work, have a look at mojolicious for a modern Perl web framework that does not require CGI.

answered Jan 08 '17 at 11:27

Wouter Verhelst

418
3
8

Sorry, but this answer is not helpful. As you guessed, my database does a lot more so mod_negotiate and mod_rewrite are not suitable. As for Mojolicious, while it's true that it avoids the use of the CGI.pm Perl module, it still relies on the Common Gateway Interface and therefore gives exactly the same problem of how the web server process receives the dynamic content from a separate page generation process. Even with Mojolicious or other frameworks (even other languages, such as PHP) you ultimately write to STDOUT, meaning the web server still has to either spool it or pass it through. – kbro Jan 08 '17 at 14:24
I gave those first two options only for completion. As for mojolicious, while it's true that it supports running in CGI mode, it's by no means the *only* option. You can run it inside Apache with PSGI, or use one of its own httpd implementations, depending on what's best. None of those require writing to STDOUT. For more details, see http://mojolicious.org/perldoc/Mojolicious/Guides/Cookbook#DEPLOYMENT – Wouter Verhelst Jan 08 '17 at 15:57
I've had another fast look through Mojolicious, and it seems to use things like `$c->render( text => "Hello world")` rather than `print "Hello world"`, but STDOUT is still there, deep down, because that's the link between the process that's generating the content and the process that's sending it to the client via HTTP. I also see that Mojolicious::Controller has a write() method which allows you to explicitly chunk large outputs. That looks like a backward step, because it suggests that Mojolicious spools ALL output before sending anything, so you have to design around that shortcoming. – kbro Jan 08 '17 at 16:27
All of which is getting away from my question - "how does Apache spool output from a GCI script". I need to know that it does it badly before I try fixing that problem, whether by jumping to Mojolicious or some other technology. – kbro Jan 08 '17 at 16:30

How does Apache spool STDOUT from a CGI script?

1 Answers1