Essentially, my question is related to memory allocation for Solaris virtual machines.

I am running a couple of old Sun ONE 6 Java web servers on two Solaris 8 virtual machines. I see that there's a reasonable amount of swap space being used, but I'm not exactly sure if this could indicate a need to add more RAM to these machines.

At service peak hours (mornings usually), the response time of the web application these servers host jumps up to at most 11 seconds (somewhat detrimental for a relatively simple web page loading action). Average response time at non peak times is about 5 seconds.

What would you be able to infer about the RAM usage for these machines from the ouput below? Is this information reasonably sufficient? Or would I need to run some other commands to rule out server memory starvation?

Finally, since there is a Java application at the core of the setup, I've also thought about:

1) Trace the heap's Object allocation to detect potential memory leaks.

2) Do some performance profiling to see if this instead related to networking delays. I mention this since the application talks with a single Oracle Database, but I would doubt this to be the case since they're pretty close from a network segmentation perspective.

I appreciate any kind of insight and feedback you could provide.

Thanks for your time and help.

Server 1:

40 processes:  38 sleeping, 1 zombie, 1 on cpu
CPU states: 99.1% idle,  0.4% user,  0.4% kernel,  0.0% iowait,  0.0% swap
Memory: 2048M real, 295M free, 865M swap in use, 3788M swap free

 12676 webservd 112  29   10  616M  242M sleep  103:37  0.48% webservd
 18317 root       1  59    0   23M   19M sleep   67:24  0.08% perl
  9479 support    1  59    0 6696K 2448K cpu/1    0:11  0.05% top
  8012 root      10  59    0   34M  704K sleep   80:54  0.04% java
  1881 root      33  29   10  110M   13M sleep   33:03  0.02% webservd
  7808 root       1  59    0   83M   67M sleep    7:59  0.00% perl
  1461 root      20  59    0 5328K 1392K sleep    6:49  0.00% syslogd
  1691 root       2  59    0   27M  680K sleep    4:22  0.00% webservd
 24386 root       1  59    0   15M   11M sleep    2:50  0.00% perl
 23259 root       1  59    0   11M 4240K sleep    2:42  0.00% perl
 24718 root       1  59    0   11M 5464K sleep    2:29  0.00% perl
 22810 root       1  59    0   19M   11M sleep    2:21  0.00% perl
 24451 root       1  53    2   11M 3800K sleep    2:18  0.00% perl
 18501 root       1  56    1   11M 3960K sleep    2:18  0.00% perl
 14450 root       1  56    1   15M 6920K sleep    1:49  0.00% perl

Server 2

 42 processes:  40 sleeping, 1 zombie, 1 on cpu
CPU states: 98.8% idle,  0.4% user,  0.8% kernel,  0.0% iowait,  0.0% swap
Memory: 1024M real, 31M free, 554M swap in use, 3696M swap free

  5607 webservd  74  29   10  284M  173M sleep   20:14  0.21% webservd
 15919 support    1  59    0 4056K 2520K cpu/1    0:08  0.09% top
 13138 root      10  59    0   34M 1952K sleep  210:51  0.08% java
 13753 root       1  59    0   22M   12M sleep  170:15  0.07% perl
 22979 root      33  29   10  112M 7864K sleep   85:07  0.04% webservd
 22930 root       1  59    0 3424K 1552K sleep   17:47  0.01% xntpd
 22978 root       2  59    0   27M 2296K sleep   10:49  0.00% webservd
 13571 root       1  59    0 9400K 5112K sleep    5:52  0.00% perl
  5606 root       2  29   10   29M 9056K sleep    0:36  0.00% webservd
 15910 support    1  59    0 9128K 2616K sleep    0:00  0.00% sshd
 13106 root       1  59    0   82M 3520K sleep    7:47  0.00% perl
 13547 root       1  59    0   12M 5528K sleep    6:38  0.00% perl
 13518 root       1  59    0 9336K 3792K sleep    6:24  0.00% perl
 13399 root       1  56    1 8072K 3616K sleep    5:18  0.00% perl
 13557 root       1  53    2 8248K 3624K sleep    5:12  0.00% perl

To figure out if your servers are lacking RAM, a useful metric would be the sr column in the vmstat command output. Just run something like vmstat 10 10 during reference and peak periods (10 samples every 10 seconds) and post the output. swap -s outputs would also be useful. Alternatively to vmstat, you might prefer to run sar -g 5 5 In any case, server2 seems to lack RAM according to "top" output. Solaris has a supported command similar to top that might also help identifying the virtual and physical memory consumers:

prstat -s rss -n 5
prstat -s size -n 5
The things that stand out to me in these snapshots are the following:

  • Lots of perl processes
  • Multiple webservd processes
  • Machines are 98% and 99% idle

These facts lead to the following questions...

  • Can you reduce the number of perl processes?
  • I suppose there's no way to switch to a threaded web server model?
  • What does the system top look like when the machines are under stress?

Finally, I'd do the following to track this down:

  • Use a network sniffer like Wireshark to see what portion of the HTTP process is actually being held up. Is it the connection? Is it the delivery of the page? Is it the delivery of a dynamic portion of the page?
  • Get a HTTP stress tool and stress your web servers to see how they react. Watch responses with vmstat and top: I like using screen in a terminal to do this.

Good luck!

I've always found the easiest way to track memory usage is the system accounting. It can jump around a lot, so it's important to review at least a week to see the usage pattern.

Edit the "sys" crontab, and you'll see some commented out runs of the script /usr/lib/sa/sa1. How often it runs determines the time resolution of accounting data saved. I usually do something like this for a 24x7 system:

20,40 * * * * /usr/lib/sa/sa1

That will store statistics in /var/adm/sa by the day of the month. Now you use sar to dump the memory stats for any of the days stored in there. Say the 3rd was a peak day for me:

sar -f /var/adm/sa/sa03 -g

The column of primary interest is pgscan/s. If that number is over 200 for long periods of time then the system does not have enough memory. At 100 you'll probably benefit from more memory but the degradation isn't severe. These days with disk swap so much slower than memory, I try to keep it at 0 except for short term jumps.

