When a system is running slowly and performance is degrading, it is difficult to know what the cause is. Whether the cause is a lack of memory, disk subsystem bottleneck, or limited scalability of a particular application, there are ways to find, understand, and possibly remove the root cause.
This article gives suggestions on where to start. It covers how to approach performance concerns and address some common performance bottlenecks, introducing a number of concepts such as Intimate Shared Memory (ISM) and priority paging, which are intertwined with performance. The emphasis is on the Solaris 2.6, 7, and 8 Operating Environments. It is not a complete treatment of all performance issues, but is intended as a place to start, to stimulate your thinking about Solaris system performance and suggest where to go next.
Performance, perhaps more than any other aspect of computer system behavior, requires a holistic approach. To identify a cause rooted in a single or multiple components, a structured approach is a must.
The practical upshot is that for performance the single most important part of the troubleshooting process is to define the problem you are trying to solve. In practical terms this means defining an operation or test case for which:
A) You know how fast it goes now.
B) You have a requirement for it to go "X" times faster, or it has gone "X" times faster under different circumstances.
Setting the baseline from which to start is the first step. Performance analysis is a top-down sport starting by defining the problem to be solved with a clear and concise statement. If you want a system to go faster, you still need to define what attribute of that system you aim to improve and what tradeoffs you will and won't accept. Until you can clearly describe the symptoms of the problem/opportunity, identifying the root cause will always be hit or miss.
Performance analysis is much like detective work where we establish the facts of the case through evidence and observation, being very careful not to jump to a premature conclusion that does not fit the facts -- only naming the suspect when the weight of evidence is overwhelming.
Be skeptical about all assumptions. What others state as a fact may really be an assumption that may or may not be incorrect. If the assumption is wrong, you may be working with false evidence and will arrive at an incorrect conclusion.
Some words of warning. The Solaris OE is in most cases very good at tuning itself for the workload in hand. The later the release, the less tuning that should be required. It has often been found that the root cause of a performance problem is an attempt at performance tuning. Pay attention to the application first and the Operating Environment last.
Any changes to the system configuration such as memory size or disk layout mean that performance settings should be checked for their current validity. This is also true of an upgrade where carrying parameters on across an upgrade may limit the performance of the new OE.
What operation(s) do you see that are symptoms of the performance problem(s)?
For example, are particular types of database query, file, or network operations slower than you think they should be? How specific can you be about the operation in terms of providing a test case, such as an SQL query or 30 lines of C?
Define your problem statement as precisely as possible to explain "what is wrong with what" to your best knowledge. Some examples of good problem statements include:
· An SQL query takes two times longer on VXFS when compared to UFS.
· SVR4 message queue operations take 30 percent longer on OE revision "A" compared to OE revision "B."
· Login to system "A" takes three times longer than login to system "Y."
A problem statement should not contain the solution or a possible solution.
Most times, getting a clear statement of the problem is more than halfway to solving a problem. It is important to take into account the perspective of the user in stating the problem you are trying to solve, which means taking the application perspective. It goes against human nature, which tries to prove or disprove a possible cause by experimenting, rather than assessing the merit of a cause relative to observed facts.
Poor problem statements include:
· mpstat "wt" column shows a high wait time.
· User jobs take too long.
The boundary between the correct functioning of a system and its applications and a performance problem is often a gray area. Entire system hangs and process hangs are beyond the scope of this article. If you suspect incorrect functioning of the system as opposed to a performance problem, then log a call with your Sun Solution Center to develop a course of action. A prerequisite for a high-performance system is that it function correctly.
As part of your proactive maintenance schedule, it is worth checking /var/adm/messages for indications of hardware issues such as disk retries or excessive message generation.
It is well worth looking back at the history of the system; if your system has given better performance, draw a timeline detailing the changes before poor performance was first noticed and when it has been seen since.
It is a good idea to keep some examples of how your system operates properly. You can easily collect and store monthly performance data, such as:
· *stat family: vmstat, mpstat, iostat, vxstat
· ps output to show what processes are running (prstat on the Solaris 8 OE)
In addition, a number of commercial and unsupported products are available for performance monitoring.
One of the issues with many such products is that threshold values are different for different hardware configurations. For example, certain values would be considered excessive and may bring a 400-MHz system to a crawl, but they may be acceptable for a 900-MHz system.
Once you have defined the performance problem you are trying to solve, the next step is to narrow down the area in which the bottleneck occurs.
Questions worth asking at this stage include:
A. What can the application tell me about what it sees as a bottleneck? Taking Oracle as an example, an Oracle DBA should know what BSTAT/ESTATS are and how to run and interpret them. Again, taking the application perspective, BSTATS/ESTATS may show the bottleneck that is limiting Oracle performance and serve as a guide for further analysis.
B. Where are we spending the most time, in kernel or user land? Answer with vmstat, mpstat or sar, ps, and prstat.
C. Are all resources of a similar type equally busy? The intent is to find unequal distribution of resources. For example, one disk may be a bottleneck, or one CPU may be busier than the others. For CPUs, look at mpstat. For disks, use iostat.
D. What process or processes are using the most resources? To see the top processes using CPU and memory resources, use:
ps -eo pid,pcpu,args | sort +1n
ps -eo pid,vsz,args | sort +1n
/usr/ucb/ps aux |more
The Solaris 8 OE provides prstat, which gives a running commentary of CPU and memory use. The output from prstat -cvm is very useful.
We now look at how to use some of the common Solaris commands for initial performance analysis.
The command vmstat is concise. Here we can see an example of insufficient CPU capacity for the executing applications.
% vmstat 15
procs memory page disk faults cpu
r b w swap free re mf pi po fr de sr m0 m1 m2 m3 in sy cs us sy id
45 0 0 2887216 182104 3 707 449 6 455 0 80 2 6 1 0 1531 5797 983 61 30 9
58 0 0 2831312 46408 5 983 582 56 3211 0 492 0 0 0 0 1413 4797 1027 69 31 0
55 0 0 2830944 56064 2 649 656 3 806 0 121 0 0 0 0 1441 4627 989 69 31 0
57 0 0 2827704 48760 4 818 723 6 800 0 121 0 0 1 0 1606 4316 1160 66 34 0
56 0 0 2824712 47512 6 857 604 56 1736 0 261 0 0 1 0 1584 4939 1086 68 32 0
58 0 0 2813400 47056 7 856 673 33 2374 0 355 0 0 0 0 1676 5112 1114 70 30 0
60 1 0 2816712 49464 7 861 720 6 731 0 110 7 0 3 0 2329 6131 1067 64 36 0
58 0 0 2817552 48392 4 585 521 0 996 0 146 0 0 0 0 1357 6724 1059 71 29 0
Always ignore the first line of vmstat output. The column labeled "r" under the "procs" section is the run queue of processes waiting to get on the CPUs. The "id" column is CPU idle time. This machine lacks the CPU resources to keep up with the process demand as seen by it spending the majority of CPU time in user space (see "us" column).
Two approaches can be taken here -- first, add extra CPUs, or second, profile the application code to determine if the part of the application can be optimized. A great deal of effort can be expended profiling sections of code -- sometimes for little gain. It's a good idea to be realistic when assessing your potential "return on investment" in relation to your time.
The mpstat command reports per-processor statistics, with each row of the table representing the activity of one processor.
$ mpstat 5
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 20 0 3592 3350 2338 1355 43 184 285 0 4578 9 6 1 84
1 19 0 304 465 283 2139 135 398 140 0 6170 9 6 1 85
2 25 0 352 507 295 2153 158 433 183 0 7508 12 7 1 81
3 26 0 357 513 302 2082 155 425 181 0 7460 12 7 0 81
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 3 0 3879 3773 2754 1832 61 322 339 0 3424 12 7 0 81
1 2 0 555 544 264 3040 197 670 112 0 4828 15 6 0 78
2 11 0 188 595 269 3141 219 738 121 0 5291 18 6 1 75
3 65 0 185 585 279 2660 211 673 110 0 5420 22 9 0 69
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 6 0 4028 3633 2620 1695 51 287 343 0 2857 12 8 0 80
1 7 0 150 545 265 3044 196 663 117 0 4374 14 4 0 81
2 14 0 226 602 279 2823 225 707 103 0 4715 22 4 1 73
3 2 0 125 600 282 2810 230 699 118 0 4665 18 4 0 78
mpstat identifies what each CPU is spending its time doing: for example, the distribution of system, user, wait, and idle time, system calls made, lock contention, interrupts, faults, and cross calls.
See the mpstat(1M) man page for details of each column.
The iostat command reports disk usage. Each row of the table represents the activity of one disk. Frequently used options include:
Table 1: iostat Options
iostat also reports activity over NFS, yet it can make output rather long.
The truss(1M) utility executes a specified command and produces a trace of the system calls it performs, the signals it receives, and the machine faults it incurs.
truss can also follow the execution of an existing process. It is a very useful tool to narrow down what resources an application is requesting from the kernel that are slow or are used to excess.
If you don't know about truss, then read the man page and give it a try. The -m option is very useful for showing faults such as page faults. The -c option gives a summary of:
· System calls
· Cumulative times spent in each system call type
· Number of failed system calls
Kernel locks protect multiple updates to data structures and control access to resources such as disk caches, network caches, and various kernel caches.
lockstat executes a command that reports all kernel lock activity for the duration of the command, irrespective of the process or device that made the request for a lock. See the lockstat(1M) man page. The option -s 10 reports the stack of the kernel threads contending on each lock.
trapstat is a tool to provide runtime trap statistics on an UltraSPARC® processor running an otherwise stock Solaris kernel. For I-TLB and D-TLB misses, trapstat can optionally display the amount of time spent in the operating system's TLB miss handler. For interrupt vector traps, trapstat can optionally display the interrupting device.
For C, C++, and FORTRAN applications, try compiling -xpg and execute the program with a typical workload that demonstrates the performance problem. Run gprof on the generated tmon.out file. This will show where the application is spending most of its time.
Forte TeamWare (formerly Sun WorkShop TeamWare, now part of Sun Studio developer tools) has a number of useful tools, such as the analyzer which provides a graphical representation of where the application is spending its time. For further details, see Sun Studio and Forte TeamWare Documentation and Rajat Garg and Ilya Sharapov's Sun BluePrints book, Techniques for Optimizing Applications: High Performance Computing.
proc tools are utilities that exercise features of /proc reporting attributes of a process such as:
· pstack - the call stack
· ptree - a tree of process relationships
· pfiles - a list of open file descriptors
· pldd - a list of dynamic libraries in use by the running processes
See the proc(1) man page for more information.
From a performance point of view, the ability to run 64-bit applications has two main benefits. The first is that much larger problems can be solved efficiently using a bigger process address space. The second is that integer arithmetic computations get to use 64-bit registers and operations.
Overall, programs get slightly larger due to larger pointer values in code and data structures. This, in turn, means that CPU caches are a little less likely to have enough cache lines, and a slight slowdown might occur in programs that could run just as well in a 32-bit environment.
Kernel thread stacks are 16Kb, rather than 8Kb, though the effect is often negligible.
Examining a Solaris system to determine the amount of memory that is free has traditionally been an area of confusion.
For releases before the Solaris 8 OE, to look for a shortage of memory, do not rely upon the "free" column or the "sr" column. The value in the "fr" column is not an indication of a lack of memory. The page cache is holding onto pages in case they may be needed again. The VM subsystem will only reclaim memory when needed.
Much has been written on this subject in the SunWorld articles and Sun Performance and Tuning - Java and the Internet. To determine if there is a lack of memory, examine the 12th column ("sr" or scan rate) in conjunction with I/O traffic to the swap partitions (using iostat -P) on disk. The "sr" column may have high figures if a large amount of I/O is being generated through the file system and the page scanner needs to run in order to free up pages for I/O.
The pageout scanner runs only when the free list shrinks below a threshold (lotsfree in pages). Any process or file inactive and not locked in memory may be paged out. The size of the freelist will appear to shrink and will remain at that value (lotsfree). The page daemon will start to scan for memory to be reclaimed from the page cache and exited and idle processes when the amount on the freelist drops below the lotsfree threshold. There is no way for the "free" value to grow much above the threshold, because there is no way to get the page scanner to reclaim memory beyond the threshold. It is more efficient for pages to be left in the page cache, rather than needlessly put on the free list.
The Solaris 8 OE implements a more efficient algorithm within the segmap driver to provide the pages required for I/O. The "fr" column in vmstat really reflects memory that is free and not used by the page cache. The -p option has been added to vmstat to give a more accurate breakdown of paging behavior.
For individual processes, the pmap command reports the address space layout of an individual process (-x option is useful).
Priority paging was introduced with the Solaris 7 OE and was back-ported to the Solaris 2.6 OE (kernel patch 105181-XX) and the Solaris 2.5.1 OE (kernel patch 103640-XX). Recent versions of both patches are available from the SunSolve Online program.
Priority paging provides an improved paging algorithm that can significantly enhance system response when the file system is being used. Priority paging introduces a new additional watermark, cachefree. The paging parameters are now:
minfree < desfree < lotsfree < cachefree
By default the new behavior is turned off in the Solaris 2.5.1, 2.6, and 7 Operating Environments, so it is important to enable this functionality on systems that are paging noticeably. cachefree is set to lotsfree if priority_paging is not enabled. If it is enabled, then cachefree is set to 2 times lotsfree by default.
Adjusting this parameter tends to make switching between windows on desktop systems faster, and this is a big help for systems running databases that read large files into memory from the file system. For systems that perform a large amount of I/O through a file system, speed increases of several hundred percent have been seen for compute-intensive jobs with a large data set.
The Solaris 8 OE uses a different algorithm, which removes the limiting factor of previous releases where the page scanner had to scan for memory to supply the segmap driver with memory in which to place I/O. All pages that the segmap no longer uses are put on a list allowing immediate reuse. Do not set priority_paging in the Solaris 8 OE. In addition, the Solaris 8 OE should not require tuning of virtual memory parameters, except on large systems where setting fastscan and maxpgio to higher values may be beneficial.
For more information on priority paging, refer to Sun Performance, Priority Paging Frequently Asked Questions.
ISM provides for the shared memory to be locked in memory, and it cannot be paged out. Memory management data structures that are normally created on a per-process basis are created once and then shared by every process. In the Solaris 2.6 OE, a further optimization takes place as the kernel tries to find 4-Mbyte contiguous blocks of physical memory that can be used as large pages to map the shared memory. This greatly reduces memory management unit overhead. (See page 333 of Performance and Tuning - Java and the Internet.) By default, applications such as Oracle, Informix, and Sybase use a special flag to specify that they want ISM.
ISM is an important optimization that makes more efficient use of the kernel and hardware resources involved in the implementation of virtual memory. In addition, ISM provides a means of keeping heavily used shared pages locked in memory.
Intimate shared memory is enabled by default, and there is no need to edit the /etc/system file to turn on this feature. In a kernel with current patch levels, turning off ISM can cause system degradation and possibly a hang condition. In addition, database configuration files, such as Oracle's init.ora file, should not have use_ism=false because it turns off ISM.
To understand swap configurations related to shared memory, see "Clearing Up Swap Space Confusion" by Adrian Cockcroft.
The two primary considerations in setting swap space size are to have enough:
1. Memory to avoid swapping in common operation
2. Swap to get a crash dump
The values for the following IPC parameters need to be determined by your database administrator (DBA). Sun Solution Centers cannot give recommendations for what the actual IPC parameter settings should be. These values are application dependent.
It is extremely easy to mistype the /etc/system setting for IPC parameters. Such an error can have a significant performance impact on the application. To check for a typo, trawl through /var/adm/messages for a message of the form:
genunix: [ID 492708 kern.notice] sorry, variable 'seminfo_semopn'
is not defined in the 'semsys'
This indicates a typo in the line. Grep for "sorry."
The Solaris 8 OE has improved defaults for IPC values than previous releases.
For releases previous to the Solaris 2.6 OE, more swap space (as "backing store") is needed for shared memory. Using swap -l, divide the block numbers by 2 to get megabytes. There should be at least 2 times the amount of swap available for allocated shared memory (shmmax).
Here are the default and maximum values for shmmax:
shmmax 1048576 (Meg) 4294967295 (4GB) 2.5.1, 2.6, 32bit solaris 7
2147483647 (2GB) 2.5 or lower
In the Solaris 2.6 OE, shmmax and shmmin are unsigned integers (32 bit). In the Solaris 7 OE, "32-bit" shmmax and shmmin are unsigned integers (32 bit). In the Solaris 7 OE, "64-bit" shmmax and shmmin are unsigned longs (64 bit). In all cases, shmmni and shmseg are signed integers (32 bit). Table 2 summarizes these commands and their type.