LinuxDevCenter.com

oreilly.comSafari Books Online.Conferences.

We've expanded our Linux news coverage and improved our search! Search for all things Linux across O'Reilly!

Search
Search Tips

advertisement

Listen Print Discuss Subscribe to Linux Subscribe to Newsletters

Installing and Configuring Squid
Pages: 1, 2

Dos and Don'ts

  • DO put a name server on the machine with Squid. It's an extra level of caching, and minimizes choke points. Don't overload a single DNS for a cluster of boxes, especially if that server has other things to do.


  • DO use the national root DNS closest to you as a forwarder. If someone else has looked up the same address recently, it should be cached. Do not use it as your only forwarder. Use at least one other machine on your network, and one on your provider's network.
  • DO increase the size of your fqdncache and ipcache. Bigger is better. Stale addresses are less important than many entries and long TTLs. Cache addresses for at least 24 hours, and negative cache for at least 5 minutes.
  • DON'T cache big objects. Next to CPU and RAM, disk I/O is your biggest bottleneck. Try not to cache anything over about a megabyte.
  • DO split your cache over several physical drives. Four 5-Gbyte drives are better than one 20-Gbyte drive -- you save time using multiple spindles. IDE will serve you as well as SCSI, unless you're using Ultra Wide SCSI.
  • DON'T put two cache drives on the same IDE controller, but you can put four high-speed or six ordinary-speed SCSI drives on a single SCSI controller. Don't mix Ultra-Wide SCSI drives with non-Ultra-Wide SCSI drives on the same chain.
  • DO keep your logs on a non-cache drive, and preferably on a different chain or controller.
  • DO test your hardware before trusting it. Some servers have incompatible or semi-compatible combinations of hardware: It's tragic to see a USD$30,000 server with Ultra Wide SCSI that has poorer disk I/O throughput than a P100 with a 5400-rpm IDE drive. You may have to buy it before you can test it, but at least you can warn the boss before it goes into service.

Configuration for economy

Saving dollars is a large part of what proxy caches are all about. It's easy to waste your proxy -- and real money in bandwidth -- if you don't understand what's going on. It's also easy to save money with one if you know the issues. Most web servers and web content are operated or produced by people who don't really understand the HTTP protocol. As cache administrators, their errors are going to land on your shoulders.

Disk space and memory

A cache can always use more disk space, but as the size of your disk-cache grows, you will need more memory to index it. There's a straightforward rule for memory.

Divide the size of your disk cache by 13 Kbytes, and multiply that by 130 bytes. Add the size of cache_mem, and add about 2.5 Mbytes more for executable files, libraries, and other sundry overhead. For example: We have a 10-Gbyte drive, and a cache_mem of 8 Mbytes.

10 Gbytes/13 Kbytes = 769,230
769,230 x 130 bytes = 99,999,900 bytes (or 97,656 Kbytes)
97,656 Kbytes + 2.5 Mbytes + 8 Mbytes = 10,849,656 Kbytes or about 108 Mbytes

The example server needs 108 Mbytes available to Squid to support 10 Gbytes of cache_dir.

Provide as much disk space as you can provide RAM to support it. Squid performs very badly when it starts to swap. Remember to set aside memory for anything else on the machine (DNS, cron, operating system, etc.).

Refresh patterns

Refresh patterns determine the lifetime of the object. Within an object lifetime, Squid will serve the object without requesting an IMS ("if modified since") request. Once the lifetime is exceeded, Squid will keep the object but will send an IMS request to the origin server. If the object has been modified since it was first cached, Squid requests the new copy. If not, it keeps the old copy. Either way, the object is marked as fresh again.

Here's our (default) basic refresh pattern:

refresh_pattern . 0 20% 4320

The dot (.) is the the regular expression pattern, and matches anything. It uses POSIX regular expressions. (See man 7 regex).

The zero (0) is the minimum freshness time. If it's anything other than zero, it will override any expiration headers given with the object. If the content provider actually provided an expiration header, we should usually honor it.

The last term (4320) is the maximum freshness time. The object becomes stale after this many minutes in the cache.

The 20 percent is used for our default case, for when there's no information from the content provider about the lifetime of the object. Squid takes x percent (20 percent in this example) of the difference between the last-modified time of the object and the current time, and uses that as the object lifetime. If the object lifetime is less than the minimum set by the refresh_pattern, it is increased to at least that. If it's greater than the supplied maximum, it's reduced to that.

Non-standard files

Some kinds of files can be maintained much longer than others. Zip, tar.gz, tgz, and .exe files rarely change content without also changing name. Using regular expressions, we can create a set of refresh patterns like this:

refresh_pattern -i exe$ 0 50% 999999
refresh_pattern -i zip$ 0 50% 999999
refresh_pattern -i tar\.gz$ 0 50% 999999
refresh_pattern -i tgz$ 0 50% 999999

Refresh pattern options

Note that these options violate the HTTP standard. Do not use them lightly.

override-expire pretends there is no expiration header on the object and calculates purely based on last-modified times. This permits you to cache sites that abuse the use of expiration headers, but also inhibits updates of frequently changed content (such as news sites).

ignore-reload prevents the object being refreshed when the user presses the refresh button on their browser. This does not perform well when the object has no content length -- you may wind up with a broken object that the users cannot reload.

reload-into-ims transforms reloads into validations. Beware: Web servers may permit an object to be updated without the last-modified time being altered. The server may then insist that the object is still valid when it actually is not.

More Dos and Don'ts

  • DO increase maximum_object_size. 40 Mbytes is not too large. 800 Mbytes might cache the large downloads.
  • DO make sure you have at least one or two local name servers for Squid to query. Don't let it query any other servers directly. This keeps Squid's offsite DNS requests to a minimum.
  • DO increase the ipcache_size and fqdncache_size.
  • DO have a parent proxy query if you can. It's cheaper to talk through a hierarchy than directly to multiple sites.
  • DON'T use ICP if you have a single parent you always use.
  • DO use calamaris to analyze your logs periodically and look for changes you can make to your refresh_patterns. There is no single "good" set. Users change their browsing habits and create new files, and new technology is always being developed. Adapt.

Caveats and gotchas

  • In many clients, "reload" forces the cache to reload from the origin server. This can cause testing problems.

  • Don't test Squid with pages that have "do not cache" headers. Squid will not cache them.

  • Use the access.log when testing to verify that you're pulling pages from Squid.

  • Access control lists have an implicit last line which reverses the rule of the last explicit line.

Final words

Squid can improve browsing speed and reduce HTTP bandwidth. The squid.conf file gives great flexibility, but can be initially daunting. These settings let you get started -- but are just a start. Experiment!

Further reading

  • Squid, A User's Guide
  • Squid configuration manual
  • $SQUID-HOME/etc/squid.conf
  • RFC 2616: Hypertext Transfer Protocol -- HTTP/1.1

Jennifer Vesperman is the author of Essential CVS. She writes for the O'Reilly Network, the Linux Documentation Project, and occasionally Linux.Com.


Return to the Linux DevCenter.


What are you doing to optimize Squid?
You must be logged in to the O'Reilly Network to post a talkback.
Post Comment
Full Threads Oldest First

Showing messages 1 through 13 of 13.

  • Squid configuration
    2007-11-01 04:26:06  IRAQ [Reply | View]

    I want to manage bandwidth on linux ES 4 machine. In my situation I have 256 kbps bandwidth and I want to distribute thin at a fixed rate to my users. How can I do this
  • Squid access pages direct from origin no caching is done
    2007-03-31 13:04:27  moodylicious [Reply | View]

    HI
    hello all tech gurus I just installed Squid in Fedora Core 4 everything is working perfectly ok but
    no caching is done when a page is requested the proxy server fetches it direct from origing no matter how may times you try to access the same page

    Please help configure the caching side of squid
    coz I need the server to cache pages so that I can access them localy in my LAN

    Thnx
    Guys
  • what about of transparent proxy?
    2003-09-02 21:04:28  anonymous2 [Reply | View]

    are the --enable-heap-replacement --enable-cache-digest --enable-dlmalloc needed?

    I am trying to set bandwith control . It is possible with squid?

    REgards

    Valdo
    • Jennifer Vesperman photo what about of transparent proxy?
      2003-12-11 20:03:50  Jennifer Vesperman | O'Reilly Author [Reply | View]

      The configuration options are not strictly needed - that is, after all, why they're options. I simply recommend the ones in the article as useful ones to have.

      I don't think you can use squid directly to set bandwidth control, but it is a useful tool for minimising the need for bandwidth control.

      As for transparent proxying, check http://linux.oreillynet.com/pub/a/linux/2001/10/25/transparent_proxy.html


      Jenn V.
  • Regarding: Installing and Configuring Squid
    2003-08-25 21:09:26  anonymous2 [Reply | View]

    Its just awwsome and delivers the know-how for a newbie to install and run Squid... this article is great.....
    • Jennifer Vesperman photo Regarding: Installing and Configuring Squid
      2003-08-26 06:36:10  Jennifer Vesperman | O'Reilly Author [Reply | View]

      You're welcome. I'm glad it is helpful.
  • How to merge different lease line connection to a single bandwidth on linux
    2002-12-04 19:52:20  anonymous2 [Reply | View]

    I have to configure 4 different lease line to a single PC having linux.
    I tried using squid but failed, can u help me in getting this task sorted
    • Jennifer Vesperman photo How to merge different lease line connection to a single bandwidth on linux
      2003-01-28 15:58:41  Jennifer Vesperman | O'Reilly Author [Reply | View]

      Squid is an HTTP proxy only, it doesn't redirect anything other than HTTP traffic.

      It seems that what you're after is a router or a multiplexer, depending on what you mean by 'lease line'.


      Jenn V.
  • Minimum-effort emergency bypass options?
    2001-08-23 15:41:37  sharumpe [Reply | View]

    Thank you for the great article! I am evaluating Squid for possible use in a campus environment, and one of the big concerns I have with any proxy is failure management.

    Is there a way to configure squid to simply pass ALL requests directly through without doing anything? Here's a scenario to illustrate my question:

    Say that a large group of users has been convinced to use the proxy for their Web browsing. Configuration of their browsers has been done and people are happy. Then one day something goes wrong (anything, really) and no one can get any Web sites. Obviously telling people to disable the proxy isn't the best solution, because they will never switch back once the problem is fixed.

    What I would like to do is be able to put another configuration into place (a separate installation) which would answer on the same port but would just get fresh copies every time.

    I apologize for the newbie nature of this question, but I don't know the proper terms to use to make my question any clearer.

    Thank you,
    Tim
    • Jennifer Vesperman photo Minimum-effort emergency bypass options?
      2001-11-18 18:15:52  Jennifer Vesperman | O'Reilly Author [Reply | View]

      Thanks for the response - I like to hear when my articles are helpful.

      Your own response (munging the acl lists) should work as a pass-through. I like the autoproxy idea as well, it might be useful if you need to automate switching between a pass-through and a proxy configuration.



      Jenn Vesperman.


    • Followup: Minimum-effort emergency bypass options?
      2001-08-31 09:57:22  sharumpe [Reply | View]

      I just thought I'd post a followup on what I found.

      It appears that if you put in a couple more ACL lines, you can achieve what I was looking for:

      acl HTTP proto HTTP
      always_direct allow HTTP

      This will cause all queries to go directly "through" the proxy, without checking the cache. That way, if the drive fails or something goes drastically wrong, you can have a fresh install of Squid, configured with this, to take over until you can get things figured out, and no reconfiguration of the client browser is necessary.
      • Jennifer Vesperman photo Followup: Minimum-effort emergency bypass options?
        2003-01-26 17:54:42  Jennifer Vesperman | O'Reilly Author [Reply | View]

        Sorry I hadn't been checking, but that sounds like an excellent solution.


        Jenn V.
    • Minimum-effort emergency bypass options?
      2001-08-23 17:50:25  turpie [Reply | View]

      Check out Proxy Auto-Config Files at http://home.netscape.com/eng/mozilla/2.0/relnotes/demo/proxy-live.html
      You can use these to specify fallback settings just in case the proxy is down.


Tagged Articles

Post to del.icio.us

This article has been tagged:

squid

Articles that share the tag squid:

Six Things First-Time Squid Administrators Should Know (58 tags)

Eleven Metrics to Monitor for a Happy and Healthy Squid (23 tags)

Peering Squid Caches (7 tags)

Installing and Configuring Squid (2 tags)

Deploying Squid, Part 2 of 2 (2 tags)

View All

proxy

Articles that share the tag proxy:

Six Things First-Time Squid Administrators Should Know (25 tags)

Eleven Metrics to Monitor for a Happy and Healthy Squid (7 tags)

How to Publish Multiple Websites Using a Single Tomcat Web Application (4 tags)

Peering Squid Caches (4 tags)

Web Testing with HTTP::Recorder (2 tags)

View All

linux

Articles that share the tag linux:

Managing Disk Space with LVM (74 tags)

Use Your Digital Camera with Linux (60 tags)

mdadm: A New Tool For Linux Software RAID Management (59 tags)

Asterisk: A Bare-Bones VoIP Example (43 tags)

View All

Sponsored Resources

  • Inside Lightroom
Advertisement

Sponsored by:

O'Reilly Media

©2009, O'Reilly Media, Inc.
(707) 827-7000 / (800) 998-9938
All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.
About O'Reilly
Academic Solutions
Authors
Contacts
Customer Service
Jobs
Newsletters
O'Reilly Labs
Press Room
Privacy Policy
RSS Feeds
Terms of Service
User Groups
Writing for O'Reilly
Content Archive
Business Technology
Computer Technology
Google
Microsoft
Mobile
Network
Operating System
Digital Photography
Programming
Software
Web
Web Design
More O'Reilly Sites
O'Reilly Radar
Ignite
Tools of Change for Publishing
Digital Media
Inside iPhone
O'Reilly FYI
makezine.com
craftzine.com
hackszine.com
perl.com
xml.com

Partner Sites
InsideRIA
java.net
O'Reilly Insights on Forbes.com