logging RAM memory high-water mark of a Linux process

6

4

I would like to log the RAM memory high-water mark of a Linux process (kernel 3.2.0-36) by the time the process finishes. I am calling the process inside a Perl script. Something like:

my $cmd = "logmemory -o mem.log mycmd options 1>cmd.out 2>cmd.err";
unless(system("$cmd") == 0) { die $!; }

Any ideas?

719016

Posted 2012-12-06T09:39:48.333

Reputation: 2 899

Which kernel version are you running? – jaume – 2013-02-07T14:50:24.620

@jaume: I am using kernel 3.2.0-36. Gràcies :-) – 719016 – 2013-02-07T15:28:51.093

Answers

13

Take a look at /proc/[pid]/status, specifically this parameter.

  • VmHWM: Peak resident set size ("high water mark").

Alternatively, you can use /usr/bin/time -v command. Here's an example of its out:

Command exited with non-zero status 1
    Command being timed: "xz -9ek access_log.3 access_log.xz"
    User time (seconds): 6.96
    System time (seconds): 0.34
    Percent of CPU this job got: 99%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.34
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
  **Maximum resident set size (kbytes): 383456**
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 24000
    Voluntary context switches: 3
    Involuntary context switches: 225
    Swaps: 0
    File system inputs: 0
    File system outputs: 0
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 1

Mxx

Posted 2012-12-06T09:39:48.333

Reputation: 2 659

3A difficulty with the /proc/[pid]/status approach is that it relies on the process being still running, which means that you can never use it to measure right to the end of the process's execution. This is particularly problematic if you're hoping to catch a process when it's about to exhaust system ram, and it then might get oom killed before you can poll its memory use. – mc0e – 2015-12-06T12:04:49.760

6

The RAM high-water mark information for a process is already collected for you by the kernel (from man proc):

/proc/[pid]/status
Provides much of the information in /proc/[pid]/stat and /proc/[pid]/statm in a format that's easier for humans to parse.
(...)
* VmHWM: Peak resident set size ("high water mark").
(...)

The tricky part is that this value should be read an instant before the process terminates.

I tried different approaches (more on that at the end of the answer) and the one that worked for me was an implementation in C:

  • logmemory invokes fork() to create a child process.

  • The child process calls ptrace() so that the parent process (which is logmemory) is notified every time the child executes a system call.

  • The child process uses execvp() to run mycmd.

  • logmemory patiently waits for a notification. When that's the case, it checks whether mycmd invoked exit_group. If that's the case, it reads /proc/<pid>/status, copies the values to mem.log and detaches from the child. Otherwise, logmemory allows mycmd to continue and waits until the next notification.

The downside is that the ptrace() slows down the monitored program, I show some comparisons below.

This version of logmemory not only logs VmHWM but also:

  • VmPeak (peak virtual memory size, which includes all code, data and shared libraries plus pages that have been swapped out and pages that have been mapped but not used)

  • a timestamp

  • the command name and arguments

This is the code, which can be surely improved - I'm not proficient in C. It works as intended, though (tested on a 32-bit Ubuntu 12.04 and a 64-bit SuSE Linux Enterprise Server 10 SP4):

// logmemory.c
#include <stdio.h>
#include <sys/ptrace.h>
#include <unistd.h>
#include <syscall.h>
#include <sys/reg.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

#define STRINGLENGTH 2048

int main(int argc, char **argv)
{   
    pid_t child_pid;
    long syscall;
    int status, index;
    FILE *statusfile, *logfile;
    char opt, statusfile_path[STRINGLENGTH], line[STRINGLENGTH], command[STRINGLENGTH], logfile_path[STRINGLENGTH] = "";
    time_t now;
    extern char *optarg;
    extern int optind;

    // Error checking
    if (argc == 1) {
        printf("Error: program to execute is missing. Exiting...\n");
        return 0;
    }
    // Get options
    while ((opt = getopt (argc, argv, "+o:")) != -1)
        switch (opt) {
            case 'o':
                strncpy(logfile_path, optarg, 2048);
                break;
            case ':':
                fprintf (stderr, "Aborting: argument for option -o is missing\n");
                return 1;
            case '?':
                fprintf (stderr, "Aborting: only valid option is -o\n");
                return 1;
    }
    // More error checking
    if (!strcmp(logfile_path, "")) {
        fprintf(stderr, "Error: log filename can't be empty\n");
        return 1;
    }
    child_pid = fork();
    // The child process executes this:
    if (child_pid == 0) {
        // Trace child process:
        ptrace(PTRACE_TRACEME, 0, NULL, NULL);
        // Execute command using $PATH
        execvp(argv[optind], (char * const *)(argv+optind));

    // The parent process executes this:
    } else {
        // Loop until child process terminates
        do {
            // Set ptrace to stop when syscall is executed
            ptrace(PTRACE_SYSCALL, child_pid, NULL, NULL);
            wait(&status);
            // Get syscall number
            syscall = ptrace(PTRACE_PEEKUSER, child_pid,
#ifdef __i386__
                          4 * ORIG_EAX,
#else
                          8 * ORIG_RAX,
#endif
                          NULL);
        } while (syscall != SYS_exit_group);

        // Construct path to status file and check whether status and log file can be opened
        snprintf(statusfile_path, STRINGLENGTH, "/proc/%d/status", child_pid);
        if ( !(logfile = fopen(logfile_path, "a+")) || !(statusfile = fopen(statusfile_path, "r")) ) {
            ptrace(PTRACE_DETACH, child_pid, NULL, NULL);
            return 1;
        }

        // Copy timestamp and command to logfile
        now = time(NULL);
        fprintf(logfile, "Date: %sCmd: ", asctime(localtime(&now)));
        for (index = optind; index < argc; index++)
           fprintf(logfile, " %s", argv[index]);
        fprintf(logfile, "\n");

        // Read status file line by line and copy lines containing VmPeak and VmHWM to logfile
        while (fgets(line, STRINGLENGTH, statusfile)) {
            if (strstr(line,"VmPeak") || strstr(line,"VmHWM"))
                fprintf(logfile, "%s", line);
        }
        fprintf(logfile, "\n");

        // Close files
        fclose(statusfile);
        fclose(logfile);

        // Detach from child process
        ptrace(PTRACE_DETACH, child_pid, NULL, NULL);
    }
    return 0;
}

Save it as logmemory.c and compile like this:

$ gcc logmemory.c -o logmemory

Run it like this:

$ ./logmemory 
Error: program to execute is missing. Exiting...
$ ./logmemory -o mem.log ls -l
(...)
$ ./logmemory -o mem.log free
             total       used       free     shared    buffers     cached
Mem:       1025144     760660     264484          0       6644     143980
-/+ buffers/cache:     610036     415108
Swap:      1046524     544228     502296
$ ./logmemory -o mem.log find /tmp -name \*txt
(...)
$ cat mem.log
Date: Mon Feb 11 21:17:55 2013
Cmd:  ls -l
VmPeak:     5004 kB
VmHWM:      1284 kB

Date: Mon Feb 11 21:18:01 2013
Cmd:  free
VmPeak:     2288 kB
VmHWM:       448 kB

Date: Mon Feb 11 21:18:26 2013
Cmd:  find /tmp -name *txt
VmPeak:     4700 kB
VmHWM:       908 kB

I wrote this C program to test logmemory's accuracy:

// bigmalloc.c
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#define ITERATIONS 200
int main(int argc, char **argv)
{
    int i=0;
    for (i=0; i<ITERATIONS; i++) {
        void *m = malloc(1024*1024);
        memset(m,0,1024*1024);
    }
    return 0;
}

Compile as usual and run it inside logmemory:

$ gcc bigmalloc.c -o bigmalloc
$ ./logmemory -o mem.log ./bigmalloc
$ tail mem.log

Date: Mon Feb 11 21:26:01 2013
Cmd:  ./bigmalloc
VmPeak:   207604 kB
VmHWM:    205932 kB

which correctly reports 200 MB used.

As a side note: time (at least on Ubuntu 12.04) surprisingly outputs a value that largely differs from what the kernel reports:

$ /usr/bin/time --format %M ./bigmalloc
823872

where M (from man time):

M   Maximum resident set size of the process during its lifetime, in Kilobytes.

As mentioned above, this comes at a price, because logmemory slows down the execution of the monitored program, for example:

$ time ./logmemory -o mem.log ./bigmalloc
real    0m0.288s
user    0m0.000s
sys     0m0.004s
$ time ./bigmalloc
real    0m0.104s
user    0m0.008s
sys     0m0.092s

$ time find /var -name \*log
(...)
real    0m0.036s
user    0m0.000s
sys     0m0.032s
$ time ./logmemory -o mem.log find /var -name \*log
(...)
real    0m0.124s
user    0m0.000s
sys     0m0.052s

Other approaches that I (unsuccessfully) tried were:

  • A shell script that creates a background process to read /proc/<pid>/status while mycmd runs.

  • A C program that forks and exec's mycmd but pauses until the child is a zombie, so avoiding ptrace and the overhead it creates. Nice idea, I thought, unfortunately VmHWM and VmPeak are no longer available from /proc/<pid>/status for a zombie.

jaume

Posted 2012-12-06T09:39:48.333

Reputation: 4 947

1

Your answer is great! I was also considering to read /proc/<pid>/status of a zombie process, but as you have said, unfortunately, this does not work. Still I didn't want to use ptrace or anything that will slowdown a program, so I searched and searched and searched for the solution... and I have found it! It is https://github.com/gsauthof/cgmemtime

– Vlad Frolov – 2015-03-23T19:24:55.647

2

Even though the topic is quite old, I want to share another project that emerged from the cgroups Linux kernel feature.

https://github.com/gsauthof/cgmemtime:

cgmemtime measures the high-water RSS+CACHE memory usage of a process and its descendant processes.

To be able to do so it puts the process into its own cgroup.

For example process A allocates 10 MiB and forks a child B that allocates 20 MiB and that forks a child C that allocates 30 MiB. All three processes share a time window where their allocations result in corresponding RSS (resident set size) memory usage.

The question now is: How much memory is actually used as a result of running A?

Answer: 60 MiB

cgmemtime is the tool to answer such questions.

Vlad Frolov

Posted 2012-12-06T09:39:48.333

Reputation: 219