The Linux Swap Debate

There’s a thread from the Linux Kernel Mailing List which seems to have gotten picked up all over the place. Most sources seem to be including the quote:

Nick Piggin explained that swap can improve performance no matter how much RAM you have, “well it is a magical property of swap space, because extra RAM doesn’t allow you to replace unused memory with often used memory. The theory holds true no matter how much RAM you have. Swap can improve performance. It can be trivially demonstrated.” This said, numerous Linux users do report success running a swapless system.

But not very many have done anything to shed any light on what’s meant there.I’ll try to give a brief example which hopefully will explain. What’s meant by “replace unused memory with often used memory” is that the VM subsystem should be able to replace a necessary but infrequently accessed page with a page that gets used very frequently. As an example lets say that you have 256 meg of memory in your desktop system. 128 meg is being used by a GNU Chess game you were in the middle of, but paused because you were struck by a sudden urge to rework a program that processes XML files and turns them into images. Lets say that the XML files in your test set comprise 192 meg of data. They don’t have to all be in memory at the same time, but the full set of file data comprises 192 meg worth of space.

Now, to understand the behavior differences between the with-swap and without-swap cases it’s necessary to understand the concept of caching that operating systems like Linux practice. Caching in this case takes the form of the operating system leaving information from the disk sitting in RAM as long as there’s free memory. This keeps the operating system from having to go back to the slow disk to get a file if you end up reusing the file. In an ideal world the RAM in a system would be large enough so that it never gets filled up, but this isn’t the case. So the operating system tries to keep as much information from the disk sitting in RAM as it can. This way if I grep a file for “mike” and then with the next command grep the file for “rocks”, the operating system doesn’t have to read the file from disk twice. Assuming it has some free RAM to use, after my first command it will keep the file contents sitting in memory. Then when I type my second command it sees the copy in memory already and skips the disk operations.

First lets assume that you have no swap at all in your system. You can still do this work, no problem. As long as all 192 meg of your data files don’t have to be in memory at the same time, which we’re assuming they wouldn’t have to be. It would be somewhat slow however, because after processing 128 meg of your input files the system would have run out of memory. It would have to start dropping the first files from the set in order to read in the contents of the last 64 meg of contents. So end the end of processing, the cache contains the final 128 meg of your data files, the first 64 meg have now been dropped from the cache. But now you fix a few bugs in your program and run your tests again. The first 64 meg of data needs to be read in again, producing more disk operations. Lets say you cycle over and over again running your program on this full set of data. With every run the disk is going to have to get accessed because not all of your files will fit into RAM with your suspended GNU Chess program sitting there. This is what people mean when they say that “the full working set” doesn’t fit in memory. It means that you can run your program, but that the system has to use the disk when it wouldn’t have had to if you had more memory.

Now, lets say that you have your system setup with 128 meg worth of swap space in addition to the 256 meg of RAM you have. Once again, assuming this is an ideal situation, when you run out of RAM the virtual memory subsystem looks at the currently used memory and says “hey, this GNU Chess program isn’t being used”. So instead of dropping your first 64 meg of data files out of cache, it instead uses the swap space to move 64 meg of the suspended GNU Chess program off to disk. Now all 192 meg of your data files are in RAM. You run your program over and over again now, and no disk operations need to be performed. Eventually, when you start using GNU Chess again, those 64 meg moved off to swap need to be reloaded. But if we assume that you’ve run your tests multiple times, the net result is a savings. This is what was meant by replacing unused memory with frequently used memory. Swap gives the virtual memory subsystem somewhere to put pages that it thinks won’t be used for a long time, and instead use that space to speed up the operations that are actually currently going on.