Solaris 上swap -s的解释

Solaris 上swap -s 命令输出的各项内容解释如下:

swap -s
total: 53609376k bytes allocated + 16159792k reserved = 69769168k used, 17837288k available

 

bytes allocated : The total amount of swap space in 1024-byte blocks that is currently allocated as backing store (disk-backed swap space).
reserved: The total amount of swap space in 1024-byte blocks not currently allocated, but claimed by memory for possible future use.
used: The total amount of swap space in 1024-byte blocks that is either allocated or reserved.
available: The total amount of swap space in 1024-byte blocks that is currently available for future reservation and allocation.

 

一般我们可以通过以下公式计算swap 使用率:

 

Output of ‘swap -s’ is:
total: 2514952k bytes allocated + 202368k reserved = 2717320k used, 7021424k available

Swap Utilization (%) is:
(2717320/(2717320+7021424))*100
= 27.9%

但是实际上Total virtual swap = RAM backed swap + Disk backed swap

swap -l report disk backed swap usage. It does not report virtual swap usage.

Physical disk swap configured:
# /usr/sbin/swap -l

swapfile dev swaplo blocks free
/dev/zvol/dsk/uppool/swap 181,3 8 163839992 163839992

Total Disk backed swap: 163839992 x 512 = 78G

 

还是建议用vmstat -p 监控下换页的情况:

 

# vmstat 5
kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr s0 s1 s2 s3 in sy cs us sy id

0 0 0 3296516 38201892 4321 49454 0 0 0 0 0 0 0 6 0 11521 164084 69372 11 31 59
0 0 0 3361076 38193196 3034 34037 0 0 0 0 0 0 0 47 0 9639 107575 37481 8 24 68
0 0 0 3501776 38286380 3325 36763 0 0 0 0 0 0 0 5 0 12679 113673 42466 8 25 67
0 0 0 3545612 38326200 4935 57916 0 0 0 0 0 0 0 63 0 13688 111744 35804 12 31 56 <<

Available virtual swap: 3545612 KB =~ 3G

并关注tmpfs文件系统的使用情况,在Solaris下/tmp目录可能会占用大量swap

The space remains allocated untill you either delete the files from /tmp directory or restart the server, as at the time of restart the swap space (/tmp) is cleaned.

Example.: If an export is performed into the /tmp directory, then the swap space will decrease with the size of the export dump file.
=====
solaris_user>swap -s
total: 2879320k bytes allocated + 277104k reserved = 3156424k used, 771104k available

solaris_user>dd if=/dev/zero of=/tmp/test.out count=100
100+0 records in
100+0 records out

solaris_user>swap -s
total: 2879416k bytes allocated + 277072k reserved = 3156488k used, 771040k available

 

High swap-space usage does not necessarily mean the system needs additional physical memory or that such usage is the reason for bad performance. High swapping in and out activities (observable with vmstat -p) can lead to performance problems: some processes have to wait for swapping activities to be finished before the processes run forward. Moreover, swapping is a single-threaded activity.

In some cases, you must also be aware of the available swap space. For example, the system runs hundreds or even thousands of Oracle session processes or Apache processes, and each process needs to reserve or allocate some swap space. In such cases, you must allocate an adequate swap device or add multiple swap devices.

Tmpfs

One difference between Solaris and other operating systems is /tmp, which is a nonpersistent, memory-based file system on Solaris (tmpfs). Tmpfs is designed for the situation in which a large number of short-lived files (like PHP sessions) need to be written and accessed on a fast file system. You can also create your own tmpfs file system and specify the size. See the man page for mount_tmpfs(1M).

Solaris also provides a ramdisk facility. You can create a ramdisk with ramdiskadm(1M) as a block device. The ramdisk uses physical memory only. By default, at most 25 percent of available physical memory can be allocated to ramdisks. The tmpfs file system uses virtual memory resources that include physical memory and swap space.

Large-sized files placed in tmpfs can affect the amount of memory space left over for program execution. Likewise, programs requiring large amounts of memory use up the space available to tmpfs. If you encounter this constraint (for example, running out of space on tmpfs), you can allocate more swap space by using the swap(1M) command. Avoid swapping in this case because swapping indicates shortage of physical memory and hurts performance even if swap space is sufficient.

tmpfs Filesystem
tmpfs file system also reports about virtual swap usage.tmpfs is a memory resident file system. It uses the page cache for caching file data. Files created in a tmpfs file system avoid physical disk read and write. The primary goal of designing tmpfs was to improve read/write performance of short lived files without invoking network and disk I/O. tmpfs does not use a dedicated memory such as a “RAM DISK”. Instead it uses virtual memory (VM) maintained by the kernel. This allows it to use VM and kernel resource allocation policies. Tmpfs files are written and read directly from the kernel memory. Pages allocated to tmpfs files are treated the same way as any other physical memory pages. Physical memory assigned to tmpfs files uses anonymous memory to store file data. The kernel does not differentiate tmpfs file data from the page cache. During memory pressure, tmpfs pages can be freed and written back to the physical swap device if the page daemon selects them as candidates for such. It is the user’s responsibility to keep a back up of tmpfs files by copying tmpfs files to disk based file system such as ufs. Otherwise, tmpfs files will be lost in case of a crash or reboot.

Tmpfs size changes dynamically depending upon how much virtual swap is available.

The "kbytes" column in "df -k /tmp" output is the amount of swap space available, rather than the total.

The tmpfs file system also has a minfree, so the total is slightly less than the amount of swap available. "kbytes" column of "df -k /tmp" output actually correspond to "swap -s" output of swap available. Normally, these two numbers are pretty close.

The difference is due to the tmpfs_minfree value, which is 2MB by default.
# df -kl -Z /tmp

Filesystem  kbytes    used     avail    capacity     Mounted on
swap        3449940   116     3449824     1%         /tmp

When a process releases memory then df -k /tmp would also show that its total file system size has increased.

Script:Monitoring Memory and Swap Usage to Avoid A Solaris Hang

Applies to:

Solaris SPARC Operating System – Version: 8.0 and later   [Release: 8.0 and later ]
Solaris x64/x86 Operating System – Version: 8 6/00 U1 and later    [Release: 8.0 and later]
Oracle Solaris Express – Version: 2010.11 and later    [Release: 11.0 and later]
Information in this document applies to any platform.

Goal

Shortage of memory and virtual swap can result in slow system performance, hang, failure to start new process (fork failure), cluster timeout and thus unplanned outage. It is critical for system availability to monitor resource usage.

Solution

Physical Memory Shortages

Memory shortages can be caused by excessive kernel or application memory allocation and leaks. During memory shortages, the page daemon wakes up and starts scanning and stealing pages to bring the freemem, kernel global variable, value over the lotsfree kernel threshold. Systems with memory shortages slow down because memory pages may have to be read from the swap disk in order for processes to continue executing.

High kernel memory allocation can be monitored by using mdb’s memstat command. It reports kernel, application and file system memory usage:

# echo "::memstat"|mdb -kPage Summary       Pages      MB    %Tot
————    ———– ——  —-
Kernel            18330      143     7% < Kernel Memory
ZFS File Data         4        0     0% < ZFS cache (see below)
Anon              36405      284    14% < Application memory: heap, stack, COW
Exec and libs      1747       13     1% < Application libraries
Page cache         3482       27     1% < File system cache
Free (cachelist)   3241       25     1% < Free memory with vnode info.intact
Free (freelist)  195422     1526    76% < Free memory

Total            258627     2020
Physical         254812     1990

 

If system is running ZFS, then ZFS cache will also be listed. ZFS uses kernel memory to cache filesystem blocks. You can monitor ZFS cache memory usage using:

# kstat -n arcstats

kstat reports kernel memory usage in pages [8k(sparc), 4k(intel)]. It also reports memory in use by kernel and pages locked by applications.

# kstat -n system_pagesmodule: unix instance: 0
name: system_pages class: pages

freemem         8337355 < available free memory
..
lotsfree         257271 < Paging starts when freemem drops below lotsfree
minfree           64317 < swapping will start if freemem drops below minfree
pageslocked     4424860 < pages locked excluding pp_kernel (kernel pages)
pagestotal     16465378 < total pages configured>
physmem        16487075 < total pages usable by solaris
pp_kernel       4740398 < memory allocated in kernel

kmstat reports memory usage in kernel slab caches. These caches are used by various kernel subsystem and drivers for allocating memory.

# echo "::kmastat"|mdb -kcache                    buf     buf     buf       memory   alloc      alloc
name                     size    in use  total     in use   succeed    fail
———————-  ——   ——  ——    ——   ———  —–
..
kmem_slab_cache            56     2455     2465    139264       2571     0
kmem_bufctl_cache          24     5463     5763    139264       6400     0
kmem_bufctl_audit_cache   128        0        0         0          0     0
kmem_va_8192             8192       74       96    786432         74     0
kmem_va_16384           16384        2       16    262144          2     0
kmem_va_24576           24576        5       10    262144          5     0
kmem_va_32768           32768        1        8    262144          1     0
kmem_va_40960           40960        0        0         0          0     0
kmem_va_49152           49152        0        0         0          0     0
kmem_va_57344           57344        0        0         0          0     0
kmem_va_65536           65536        0        0         0          0     0
kmem_alloc_8                8    97210    98649    794624    3884007     0
kmem_alloc_16              16    29932    30988    499712    9786629     0
kmem_alloc_24              24    43651    44409  1073152    69596060     0
kmem_alloc_32              32    11512    12954    417792   71088529     0

To isolate issues with high kernel memory allocation and leak, one needs to turn ON kernel memory auditing by setting a tunable below in /etc/system file and reboot:

set kmem_flags=0x1

Continue to run kmastat on a regular basis and monitor the growth of kernel caches. Force a system panic when kernel memory allocation reaches an alarming level. Send the kernel core dump located in /var/crash directory to oracle support for analysis:

To monitor application memory usage consider using:

$prstat -s rss -can 100$ps -eo ‘addr zone user s pri pid ppid pcpu pmem vsz rss stime time nlwp psr args’

To see which memory segment in the process has high memory allocation:

$pmap -xs <pid>

Continued growth in application memory usage is a sign of a memory leak. You may request the application vendor to provide you tools or consider linking to libumem(3LIB) that offers a rich set of debugging facilities. See article on how to use it. You can monitor application malloc() using DTrace scripts.

Process allocation (via malloc()) requested size distribution plot:

dtrace -n 'pid$target::malloc:entry { @ = quantize(arg0); }' -p PID

Process allocation (via malloc()) by user stack trace and total requested size:

dtrace -n 'pid$target::malloc:entry { @[ustack()] = sum(arg0); }’ -p PID

 

Virtual Memory Shortages:

Processes use virtual memory. A process’ virtual address space is made up of a number of memory segments: text, data, stack, heap, cow segments. When a process accesses the virtual address, it results in a page fault that brings the data into physical memory. The faulted virtual address is then mapped to physical memory. All pages reside in the memory segment and have backing store where the pages within the segment can be migrated during memory shortages. Text/data segments are backed by executable file on the file system. Stack, heap, COW (copy-on-write) and shared memory pages are anonymous (Anon) pages and they are backed up by virtual swap.

ISM segment does not require swap reservations considering all pages are locked in memory by kernel and are not candidate for swapping.

DISM requires swap reservation considering memory can be locked and unlocked by the process.

When process use DISM it selectively increases the size of SGA by locking the ranges. Failure to lock the DISM region and continue using it as SGA for DB block caching may result in slow Oracle DB performance because accessing these pages result in page fault and that will slow down the oracle. See Doc: 1018855.1

When a process starts touching pages then anon structures are allocated, there is no physical disk swap allocated. Swap allocation in Solaris only happens when memory is short and pages need to be migrated to the swap device to keep up with workload memory demand. That is the reason, “swap -l” that reports physical disk swap allocation shows same value in “block” and “free” columns during normal conditions.

Solaris can run without physical disk swap and that is due to swapfs abstraction that acts as if there is a real swap space backing up the page. Solaris works with virtual swap and it is composed of physical memory and physical disk swap. When there is no physical disk swap configured, swap reservation happens against physical memory. Swap reservation against memory has a draw back and that is the system cannot do malloc() bigger than the physical memory configured. Advantage of running without physical disk swap is that the malicious program unable to do huge mallocs and thus cannot cause the system to crawl due to memory shortages.

Virtual swap = Physical memory + Physical Disk swap
Available virtual swap is reported by:

  • vmstat: swap
  • swap -s

Disk back swap is reported by:

  • swap -l


Per process virtual swap reservation can be displayed:

 

  •  pmap -S <pid>

prstat can provide virtual memory usage (SIZE) of the process, however it contains all virtual memory used by all memory segment not just anon memory:

  • prstat -s size -can 100 15″
  • prstat -s size -can -p <pidlist> 100 15

You can dump the process address space showing all segment using:

  • pmap -xs <pid>

 

When a process calls malloc()/sbrk() only virtual swap is reserved. Reservation is done against the physical disk swap first. If that is exhausted or not configured then reservation is done against physical memory. If both are exhausted then malloc() fails. To make sure malloc() won’t fail due to lack of virtual swap configure large physical disk swap in the form of disk or file. You can monitor swap reservation via “swap -s” and “vmstat:swap”, as described above

On a system with plenty of memory, “swap -l” reports the same value for “block” and “free” column

“swap -l” reporting a large value in “free” does not mean that there is plenty of virtual swap available and thus malloc will not fail because “swap -l” does not provide information about virtual swap usage, it only provides information about physical disk swap allocation. It is “swap -s” and “vmstat:swap” that reports information about how much virtual swap available for reservation.

Script to monitor memory usage:

#!/bin/ksh

# Script monitors kernel and application memory usage

PATH=/bin:/usr/bin:/usr/sbin; export PATH
trap “killall” HUP INT QUIT KILL TERM USR1 USR2
killall()
{
for PID in $PIDLIST
do
kill -9 $PID 2>/dev/null
done
exit
}

restart()
{
for PID in $PIDLIST
do
kill -9 $PID 2>/dev/null
done
}

DIR=DATA.`date +%Y%m%d-%T`
TS=`date +%Y%m%d-%T`

mkdir $DIR
cd $DIR

while true
do
TS=`date +%Y%m%d-%T`
echo $TS >> mem.out
echo “output of ::memstat” >> mem.out
echo ::memstat|mdb -k >> mem.out
echo “output of kstat -n ZFS ARC memory usage” >> mem.out
kstat -n arcstats >> mem.out
echo “output of ::kmastat” >>mem.out
echo “::kmastat”|mdb -k >> mem.out
echo “output of swap -s and swap -l” >>mem.out
echo “swap -s” >>mem.out
swap -s >>mem.out
echo “swap -l” >>mem.out
swap -l >>mem.out
echo “output of ps” >>mem.out
/usr/bin/ps -eo ‘addr zone user s pri pid ppid pcpu pmem vsz rss stime time nlwp psr args’ >>mem.out
#
# start vmstat, mpstat and prstat in the background
#
PIDLIST=””
echo $TS >>vmstat.out
vmstat 5 >> vmstat.out &
PIDLIST=”$PIDLIST $!”
echo $TS >>mpstat.out
mpstat 5 >> mpstat.out &
PIDLIST=”$PIDLIST $!”
echo $TS >>prstat.out
prstat -s rss -can 100 >>prstat.out &
PIDLIST=”$PIDLIST $!”

sleep 600 # every 10 minutes

restart
done

Oracle Solaris 11 Express发布了

甲骨文Solaris 11 Express操作系统在Solaris 10的基础上进一步加强了各种功能,Solaris 11 Express将为关键的企业系统环境提供最佳的UNIX体验(与之相对应的是Oracle Enterprise Linux,将提供最优的Linux体验)。举例而言新系统中基于网络的包管理工具(package management tools)可以大大减少系统停机时间,并提供完整安全的系统升级方案,同时其内建的网络虚拟化及委托管理将为应用程序的整合提供从所未有的灵活性,Solaris还将持续提供业界最高级别的系统安全。Oracle公司宣称Solaris 11 Express将是Solaris平台起劲为止最激动人心的版本。
Oracle Solaris 11 Express已经在多种多样或由Oracle或由其他第三方硬件供应商的提供的Sparce架构的或基于X86的硬件上通过了全面测试。此外支持Oracle独有的Exadata Database Machine数据库服务器和Exalogic云的Solaris 11 Express也即将到来。

现在我们可以从Oracle OTN下载到Solaris 11 Express的安装介质,Oracle自家的UNIX操作系统会是什么样子呢?

Oracle在硬件领域正式向IBM宣战

在对原SUN客户的公示中,Larry Ellison火药味十足地表示将在硬件领域与龙头老大IBM争霸。

公示向原SUN客户表示一下四点:

  1. 在Sparc架构上花费比SUN更多的开发费用。(当然光钱多,似乎有点形而上)
  2. 在Solaris操作系统上花费比SUN更多的开发费用。(Solaris已经十分健全,Oracle可能让其更自动化–auto tuning,更好的支持?大metalink计划?)
  3. 投入超过2倍于SUN的销售和技术专家服务于Sparc和Solaris的业务。(销售当然要多,这是Oracle的老路)
  4. 整合Oracle软件与Sun硬件,将明显提升原Sun的硬件性能。

公示结尾为larry Ellison的结语:

We’re in it to win it. IBM,we ‘re looking foward to competing with you in hardware business.

oracle com iobm

这份开战宣言表明了Oracle在硬件领域发展的决心,以及同IBM争霸的勇气。

两家同样以销售著名的公司(相对SUN等技术著名公司),将在未来几年内为我们展示新一轮的IT航母兼并战。

沪ICP备14014813号

沪公网安备 31010802001379号