Script:Monitoring Memory and Swap Usage to Avoid A Solaris Hang

Applies to:

Solaris SPARC Operating System – Version: 8.0 and later   [Release: 8.0 and later ]
Solaris x64/x86 Operating System – Version: 8 6/00 U1 and later    [Release: 8.0 and later]
Oracle Solaris Express – Version: 2010.11 and later    [Release: 11.0 and later]
Information in this document applies to any platform.

Goal

Shortage of memory and virtual swap can result in slow system performance, hang, failure to start new process (fork failure), cluster timeout and thus unplanned outage. It is critical for system availability to monitor resource usage.

Solution

Physical Memory Shortages

Memory shortages can be caused by excessive kernel or application memory allocation and leaks. During memory shortages, the page daemon wakes up and starts scanning and stealing pages to bring the freemem, kernel global variable, value over the lotsfree kernel threshold. Systems with memory shortages slow down because memory pages may have to be read from the swap disk in order for processes to continue executing.

High kernel memory allocation can be monitored by using mdb’s memstat command. It reports kernel, application and file system memory usage:

# echo "::memstat"|mdb -kPage Summary       Pages      MB    %Tot
————    ———– ——  —-
Kernel            18330      143     7% < Kernel Memory
ZFS File Data         4        0     0% < ZFS cache (see below)
Anon              36405      284    14% < Application memory: heap, stack, COW
Exec and libs      1747       13     1% < Application libraries
Page cache         3482       27     1% < File system cache
Free (cachelist)   3241       25     1% < Free memory with vnode info.intact
Free (freelist)  195422     1526    76% < Free memory

Total            258627     2020
Physical         254812     1990

 

If system is running ZFS, then ZFS cache will also be listed. ZFS uses kernel memory to cache filesystem blocks. You can monitor ZFS cache memory usage using:

# kstat -n arcstats

kstat reports kernel memory usage in pages [8k(sparc), 4k(intel)]. It also reports memory in use by kernel and pages locked by applications.

# kstat -n system_pagesmodule: unix instance: 0
name: system_pages class: pages

freemem         8337355 < available free memory
..
lotsfree         257271 < Paging starts when freemem drops below lotsfree
minfree           64317 < swapping will start if freemem drops below minfree
pageslocked     4424860 < pages locked excluding pp_kernel (kernel pages)
pagestotal     16465378 < total pages configured>
physmem        16487075 < total pages usable by solaris
pp_kernel       4740398 < memory allocated in kernel

kmstat reports memory usage in kernel slab caches. These caches are used by various kernel subsystem and drivers for allocating memory.

# echo "::kmastat"|mdb -kcache                    buf     buf     buf       memory   alloc      alloc
name                     size    in use  total     in use   succeed    fail
———————-  ——   ——  ——    ——   ———  —–
..
kmem_slab_cache            56     2455     2465    139264       2571     0
kmem_bufctl_cache          24     5463     5763    139264       6400     0
kmem_bufctl_audit_cache   128        0        0         0          0     0
kmem_va_8192             8192       74       96    786432         74     0
kmem_va_16384           16384        2       16    262144          2     0
kmem_va_24576           24576        5       10    262144          5     0
kmem_va_32768           32768        1        8    262144          1     0
kmem_va_40960           40960        0        0         0          0     0
kmem_va_49152           49152        0        0         0          0     0
kmem_va_57344           57344        0        0         0          0     0
kmem_va_65536           65536        0        0         0          0     0
kmem_alloc_8                8    97210    98649    794624    3884007     0
kmem_alloc_16              16    29932    30988    499712    9786629     0
kmem_alloc_24              24    43651    44409  1073152    69596060     0
kmem_alloc_32              32    11512    12954    417792   71088529     0

To isolate issues with high kernel memory allocation and leak, one needs to turn ON kernel memory auditing by setting a tunable below in /etc/system file and reboot:

set kmem_flags=0x1

Continue to run kmastat on a regular basis and monitor the growth of kernel caches. Force a system panic when kernel memory allocation reaches an alarming level. Send the kernel core dump located in /var/crash directory to oracle support for analysis:

To monitor application memory usage consider using:

$prstat -s rss -can 100$ps -eo ‘addr zone user s pri pid ppid pcpu pmem vsz rss stime time nlwp psr args’

To see which memory segment in the process has high memory allocation:

$pmap -xs <pid>

Continued growth in application memory usage is a sign of a memory leak. You may request the application vendor to provide you tools or consider linking to libumem(3LIB) that offers a rich set of debugging facilities. See article on how to use it. You can monitor application malloc() using DTrace scripts.

Process allocation (via malloc()) requested size distribution plot:

dtrace -n 'pid$target::malloc:entry { @ = quantize(arg0); }' -p PID

Process allocation (via malloc()) by user stack trace and total requested size:

dtrace -n 'pid$target::malloc:entry { @[ustack()] = sum(arg0); }’ -p PID

 

Virtual Memory Shortages:

Processes use virtual memory. A process’ virtual address space is made up of a number of memory segments: text, data, stack, heap, cow segments. When a process accesses the virtual address, it results in a page fault that brings the data into physical memory. The faulted virtual address is then mapped to physical memory. All pages reside in the memory segment and have backing store where the pages within the segment can be migrated during memory shortages. Text/data segments are backed by executable file on the file system. Stack, heap, COW (copy-on-write) and shared memory pages are anonymous (Anon) pages and they are backed up by virtual swap.

ISM segment does not require swap reservations considering all pages are locked in memory by kernel and are not candidate for swapping.

DISM requires swap reservation considering memory can be locked and unlocked by the process.

When process use DISM it selectively increases the size of SGA by locking the ranges. Failure to lock the DISM region and continue using it as SGA for DB block caching may result in slow Oracle DB performance because accessing these pages result in page fault and that will slow down the oracle. See Doc: 1018855.1

When a process starts touching pages then anon structures are allocated, there is no physical disk swap allocated. Swap allocation in Solaris only happens when memory is short and pages need to be migrated to the swap device to keep up with workload memory demand. That is the reason, “swap -l” that reports physical disk swap allocation shows same value in “block” and “free” columns during normal conditions.

Solaris can run without physical disk swap and that is due to swapfs abstraction that acts as if there is a real swap space backing up the page. Solaris works with virtual swap and it is composed of physical memory and physical disk swap. When there is no physical disk swap configured, swap reservation happens against physical memory. Swap reservation against memory has a draw back and that is the system cannot do malloc() bigger than the physical memory configured. Advantage of running without physical disk swap is that the malicious program unable to do huge mallocs and thus cannot cause the system to crawl due to memory shortages.

Virtual swap = Physical memory + Physical Disk swap
Available virtual swap is reported by:

  • vmstat: swap
  • swap -s

Disk back swap is reported by:

  • swap -l


Per process virtual swap reservation can be displayed:

 

  •  pmap -S <pid>

prstat can provide virtual memory usage (SIZE) of the process, however it contains all virtual memory used by all memory segment not just anon memory:

  • prstat -s size -can 100 15″
  • prstat -s size -can -p <pidlist> 100 15

You can dump the process address space showing all segment using:

  • pmap -xs <pid>

 

When a process calls malloc()/sbrk() only virtual swap is reserved. Reservation is done against the physical disk swap first. If that is exhausted or not configured then reservation is done against physical memory. If both are exhausted then malloc() fails. To make sure malloc() won’t fail due to lack of virtual swap configure large physical disk swap in the form of disk or file. You can monitor swap reservation via “swap -s” and “vmstat:swap”, as described above

On a system with plenty of memory, “swap -l” reports the same value for “block” and “free” column

“swap -l” reporting a large value in “free” does not mean that there is plenty of virtual swap available and thus malloc will not fail because “swap -l” does not provide information about virtual swap usage, it only provides information about physical disk swap allocation. It is “swap -s” and “vmstat:swap” that reports information about how much virtual swap available for reservation.

Script to monitor memory usage:

#!/bin/ksh

# Script monitors kernel and application memory usage

PATH=/bin:/usr/bin:/usr/sbin; export PATH
trap “killall” HUP INT QUIT KILL TERM USR1 USR2
killall()
{
for PID in $PIDLIST
do
kill -9 $PID 2>/dev/null
done
exit
}

restart()
{
for PID in $PIDLIST
do
kill -9 $PID 2>/dev/null
done
}

DIR=DATA.`date +%Y%m%d-%T`
TS=`date +%Y%m%d-%T`

mkdir $DIR
cd $DIR

while true
do
TS=`date +%Y%m%d-%T`
echo $TS >> mem.out
echo “output of ::memstat” >> mem.out
echo ::memstat|mdb -k >> mem.out
echo “output of kstat -n ZFS ARC memory usage” >> mem.out
kstat -n arcstats >> mem.out
echo “output of ::kmastat” >>mem.out
echo “::kmastat”|mdb -k >> mem.out
echo “output of swap -s and swap -l” >>mem.out
echo “swap -s” >>mem.out
swap -s >>mem.out
echo “swap -l” >>mem.out
swap -l >>mem.out
echo “output of ps” >>mem.out
/usr/bin/ps -eo ‘addr zone user s pri pid ppid pcpu pmem vsz rss stime time nlwp psr args’ >>mem.out
#
# start vmstat, mpstat and prstat in the background
#
PIDLIST=””
echo $TS >>vmstat.out
vmstat 5 >> vmstat.out &
PIDLIST=”$PIDLIST $!”
echo $TS >>mpstat.out
mpstat 5 >> mpstat.out &
PIDLIST=”$PIDLIST $!”
echo $TS >>prstat.out
prstat -s rss -can 100 >>prstat.out &
PIDLIST=”$PIDLIST $!”

sleep 600 # every 10 minutes

restart
done