How and why to use the Apache proxy server


July 30, 2012

How and why to use the Apache proxy server

Even if you have no interest in serving web pages from your new OS X box, there’s at least one feature of Apache (the built-in web server) that you might want to put to use – the proxy server.

A proxy server is nothing more than a server which sits between a client (such as a web browser) and a real server (such as a web host). It intercepts all requests sent by the client and decides if it can handle the request itself. If it cannot, it then passes the request on to the real server.

Why might you find this useful? There are two primary reasons. First, if you’re a parent, you can use the proxy server to control which sites your kids can and cannot have access to. This may make you feel slightly more comfortable leaving them alone in front of the machine … although any child with some level of net experience will be able to find ways to get what they wanted anyway.

Since the proxy will block sites that you specify, you can also use it to block ad servers such as www.doubleclick.net (and there goes any chance of ever having advertisers on this site … want to get blacklisted … just explain how to block ad servers! 😉

The second usage is for caching web content locally. If you have a connection that’s shared between multiple computers, you can use the proxy to store pages locally. That way, if you browse cnn.com and your spouse visits the site 30 seconds later from another machine, they will get a locally cached page which will be served very quickly.

Read the rest of this article if you’d like instructions on setting up Apache’s proxy server.

The first step is to make sure your web server is stopped. You can do this through the System Preferences -> Network -> Services tab, or via the terminal. Since we’re going to do the rest of this in a terminal session, you might as well start there. Open a terminal, and login as root (“su” and enter your password). Then type

apachectl stop

This will stop the webserver.

Once Apache is stopped, you’ll want to make a backup copy of your configuration file, in case you make a mistake:

cd /Library/WebServer/Configuration
cp apache.conf apache.bak

Note that a backup file may already exist; you can use a new name if you’d like to keep everything intact.

Pick your favorite editor, and edit the file named “apache.conf”. I’m using vi, but emacs and pico will also work.

First we need to load and add the proxy module to the server. Find the section (it’s around line 231 in my config):

#LoadModule digest_module      /System/Library/Apache/Modules/mod_digest.so
#LoadModule proxy_module       /System/Library/Apache/Modules/libproxy.so
#LoadModule cern_meta_module   /System/Library/Apache/Modules/mod_cern_meta.so

Remove the comment (“#”) from the second line.

Now search for this section (it’s around line 269 in my file):

#AddModule mod_digest.c
#AddModule mod_proxy.c
#AddModule mod_cern_meta.c

Again, remove the comment mark from the second line.

These two steps activated the proxy server module. Now we need to do some basic configuration. Search for the section that begins as follows (it’s around line 937 in my file):

# Proxy Server directives. Uncomment the following lines to
# enable the proxy server:

Uncomment all the lines that are code (not comments). When you are done, it should look like this:

# Proxy Server directives. Uncomment the following lines to
# enable the proxy server:
#
<IfModule mod_proxy.c>
    ProxyRequests On
    
    <Directory proxy:*>
        Order deny,allow
        #Deny from all
        Allow from .your_domain.com
        #NOTE: Replace '.your_domain.com' with your IP number(s)!
    </Directory>

    #
    # Enable/disable the handling of HTTP/1.1 "Via:" headers.
    # ("Full" adds the server version; "Block" removes all outgoing Via: headers)
    # Set to one of: Off | On | Full | Block
    #
    ProxyVia On

    #
    # To enable the cache as well, edit and uncomment the following lines:
    # (no cacheing without CacheRoot)
    #
    CacheRoot "/Library/WebServer/ProxyCache"
    CacheSize 5
    CacheGcInterval 4
    CacheMaxExpire 24
    CacheLastModifiedFactor 0.1
    CacheDefaultExpire 1
    NoCache a_domain.com another_domain.edu joes.garage_sale.com

</IfModule>
# End of proxy directives.

Notice two small things. First, Deny from all is left commented out — otherwise, you wouldn’t be able to browse the web at all! Second, you need to enter the IP numbers that you will allow to use the proxy server in the Allow from line. If you have more than one, enter them separated by a space.

That’s it for basic configuration. The cache settings are all very tweakable, and I highly recommend reading this article on the apache.org website for full details on all the variables and what they mean.

If you want to use the proxy server to block sites, you need to add the ProxyBlock directive. This command can go basically anywhere in the apache.conf file, but I added it in the proxy section, so that it would be easy to find. The format is simple, as you can see in this example:

ProxyBlock playboy.com penthouse.com adserver.ugo.com

Any sites listed will be blocked from browsing, and return errors when called from within pages, as you can see here:

Once you’ve added the sites you would like blocked (note that it’s useful for debugging to include one at first, so you can verify the proxy server is working correctly), save your changes and then restart the webserver with:

apachectl start

Remember to then disconnect from your root editing session!

The final step is to tell your browser that you want to use the proxy server to serve your web pages. In Explorer, go into Edit-Preferences and the Proxies tab, and check the Web Proxy box. Enter 127.0.0.1 as the server address. You should now be good to go; test it by trying to browse one of the sites you listed in the ProxyBlock statement. You should get a message like this:

Proxy Error

Although this is a long tutorial, setting up a proxy server is relatively trivial, and could save you some time and relieve some of your worries about what sites your kids are hitting in your absence.