3

I share a server with HAL. The server has 32 GB of memory.

I rarely use more than 1 GB of memory, and when I do, it is for a few minutes at a time, and I don't mind sending such jobs to the back of the line.

HAL read/writes large files (e.g. using gunzip). This can take up to 100% of the memory CPU, intermittently, for hours. This is usually done overnight, but when running, will make even simple commands such as cd take 30s, opening emacs can take minutes.

I would like to be able to reserve 1 GB for use by processes that use << 1GB (like a text editor). I would also like to stay out of HAL's way, and see no reason that this should be an issue.

HAL says that a queueing system (like PBS) can not be used to put a low priority on read/write, e.g. to leave 1 GB of memory always available when large jobs are running. In his words:

the script used to gunzip snags all the processors it can because the data is large... queueing would not solve this... during transfer of files from (that server) to (this server), an inflation step does lots of read/write

Why couldn't queuing solve this problem? What could?

masegaloeh
  • 17,978
  • 9
  • 56
  • 104
  • In cost of some overhead, virtualization can solve this problem. – rvs Oct 11 '11 at 17:49
  • 2
    *Smells* like a homework question – Tablemaker Oct 11 '11 at 18:16
  • 1
    @Shads0 the smell is probably because a) I used the name 'Bob' to protect the innocent, and b) because I am a scientist trying to play nicely with programmers but have not developed fluency in programmer talk. For more info, see link to my [homepage](http://www.openwetware.org/wiki/User:David_S_LeBauer) in my profile or more info about [the work in question](http://128.174.125.122/wiki/index.php/PECAn_Documentation:Use) – David LeBauer Oct 11 '11 at 18:25
  • Urbana? As in where HAL came online? – Bart Silverstrim Oct 11 '11 at 20:36
  • 3
    If so I refuse to acknowledge this question's existence until Bob is changed to HAL. – Bart Silverstrim Oct 11 '11 at 20:37
  • 1
    @Bart done. Prior to [HAL](http://en.wikipedia.org/wiki/HAL_9000), Urbana was home to [ENIAC](http://en.wikipedia.org/wiki/ENIAC), [ORDVAC](http://en.wikipedia.org/wiki/ORDVAC), and [ILLIAC](http://en.wikipedia.org/wiki/ILLIAC), as well as [UNIX License #1, Telnet, and other achievements](http://cs.illinois.edu/csillinois/history) – David LeBauer Oct 11 '11 at 21:05
  • ENIAC didn't go into space and sing "Bicycle Built for Two." A difficult tune to carry. – Bart Silverstrim Oct 11 '11 at 21:16
  • And you changed it to HAL! :-) – Bart Silverstrim Oct 11 '11 at 21:16
  • @BartSilverstrim it reads better with HAL as my nemesis; (I am now inclined to change my user name to Dave). In any case, I was incorrect about ENIAC, which was built at U. Penn. Other info is correct. – David LeBauer Oct 11 '11 at 21:23

2 Answers2

5

You could have a job queuing system or modify the kernel's scheduling approach.

I'm going to ignore those options and suggest that you use ionice -- or more specifically that Bob uses it to lower his priority. It sound like you're having a disk access issue rather than a memory issue.

Regular nice may also be an option as it will indirectly affect disk priority (from the ionice man page: "The priority within the best effort class will be dynamically derived from the cpu nice level of the process: io_priority = (cpu_nice + 20) / 5.") The software atop is also really handy for getting an overview of what's bottle-necking and if it's regular IO or swapping to disk that is at issue.

Jeff Ferland
  • 20,239
  • 2
  • 61
  • 85
  • if I understand correctly, because this is a disk access issue, it is not something that can/should be effectively handled by a queue? – David LeBauer Oct 11 '11 at 18:38
  • @Dave It's more that implementing a job queuing system is an intensive approach to a more easily solved problem. – Jeff Ferland Oct 12 '11 at 15:07
5

Bob read/writes large files (e.g. using gunzip). This can take up to 100% of the memory, intermittently, for hours. This is usually done overnight, but when running, will make even simple commands such as cd take 30s, opening emacs can take minutes.

First gzip and gunzip do not work the way you think they do -- the algorithm used by gzip is block based, and while it may be slightly larger when chugging through a large compressed file even uncompressing a 1GB .gz file only chews up about 15M of RAM (total process size) on my machine.

Second, unless you're sucking the entire file into RAM simply reading or writing a large file won't chew up much memory - The OS may hold the data in a filesystem cache, but cache data will be evicted the moment a program needs that RAM. Only data being held in a program's working memory counts toward "memory pressure" (used RAM, plus or minus a few other factors).


I would like to be able to reserve 1GB for use by processes that use << 1GB (like a text editor). I would also like to stay out of Bob's way, and see no reason that this should be an issue.

Stop trying to outsmart your operating system's pager: The kernel will swap out tasks to ensure that whoever is currently executing has RAM in which to work. Yes, this means you will be hitting disk if you're using more RAM than you have available. The solution is to limit the amount of RAM you're using by running fewer programs, or to add more RAM.

The concept of "reserving" RAM is fundamentally flawed from an OS design perspective: You could have no other activity going on, but Bob's program can't touch the "reserved" RAM, so now it has to go and swap to disk. For want of (e.g.) 1KB, Bob's program is now making constant disk hits paging data in and out of RAM, and your performance goes through the floor.

You can artificially limit Bob's RAM usage (ulimit), but when he hits the hard limit his programs will probably not react well (think: malloc(): Unable to allocate XXXXX bytes followed by an ungraceful exit).

You can, as rvs mentioned in their comment, virtualize the environment and ensure that Bob's processes only have access to the resources available to their VM, but this simply moves the problem (Bob's VM will begin swapping, and swapping in a VM is, by necessity, even slower than on bare metal).


In the Real World, Jeff is probably right - You're probably hitting Disk IO limits rather than RAM limits: Decompressing files is a huge amount of disk I/O (read in from the compressed file, pass it through the CPU and a tiny bit of RAM, spit it out to the uncompressed file). nice (to affect CPU priority) and ionice if supported (to affect disk priority) will alleviate your problem.


Lecture

Not for nothing, but I recall this same question from my Operating System design textbook (although the example wasn't gzip/gunzip). While there's a slim chance you're actually encountering this problem in the real world I have my doubts: It's simply too contrived of an example.
Remember that Server Fault is for system administrators and desktop support professionals, people who manage or maintain computers in a professional capacity - Not for CS/301 homework.(FAQ)

If this is an actual real-world problem then I apologize - you may be the 1 in 10000 that actually encountered this corner case in a production environment.

voretaq7
  • 79,345
  • 17
  • 128
  • 213
  • 3
    thanks for the apology in advance; perhaps a quick looked at my profile would have clarified the work that I do. More importantly, make sure not to confuse the probability that this specific homework question would appear (your estimate of 1 in 10,000 would suggest that it should have already appeared anyway) with the probability that one of the ~100,000 questions on this site is similar to one that previously appeared in your textbook. – David LeBauer Oct 11 '11 at 18:35
  • I pointed realized that the job was using 100% of the CPU, _not_ 100% of the memory, and have updated the question. – David LeBauer Oct 11 '11 at 18:45
  • 2
    @voretaq7 I've seen disk contention issues often. The disk is so busy with somebody else's read / write that it takes a full second for ls to list a directory. Heck, just trying to open applications immediately upon boot of my laptop is a drag. – Jeff Ferland Oct 11 '11 at 18:45